Re: Overwriting _id column in nosql document store a good idea?
Hi David,
I think you are kind of answering your own question. The "_id" field plays an important role, for sure.
From the application standpoint, it's used mostly in lookups, as you mention, which then trickles down to domain specific operations like
"collection.getOne()", "collection.replaceOne()" and "collection,addOrReplaceOne()".
However, in the end, the fact that MySQL auto-generates one is mostly for convenience. Applications are free to override that behaviour by
providing their own value. Somethings need to be had into account though both in terms of functionality and in terms of performance.
A collection is (in its most simple and relevant form) a MySQL table composed by two columns, "_id" and "doc" (and additionally "_json_schema",
which is not relevant for this discussion). "_id" is a VARBINARY stored virtual column that is auto-generated when the collection is created,
"doc" is a JSON value that contains the "raw" document. Every time a document is added to a collection, the "_id" column is populated with the
value of "_id" provided for that document, or if one is not provided, the auto-generated "_id" value will be used instead.
Also, a BTREE index is automatically created for the "_id" column, which is relevant if you want to perform online schema changes or any sort
of migration that might affect that specific column.
From a performance standpoint, it's important to be aware that non-sequential primary keys are not the best idea with InnoDB and that will have
an impact when inserting documents. So, monotonically increasing and globaly unique ids (such as the ones auto-generated by MySQL) are the best
way to avoiid page-splits and tree-reorgs for that.
Hope it helps.