The specification states that the ID for a row, obtained for example in Python with
import pymongo
myclient = pymongo.MongoClient("mongodb://localhost:27017/")
mydb = myclient["mydatabase"]
mycol = mydb["customers"]
res = mycol.insert_one({"name": "John", "address": "Highway 37"})
print(res.inserted_id)
is:
a 4-byte timestamp value, representing the ObjectId’s creation, measured in seconds since the Unix epoch
a 5-byte random value
a 3-byte incrementing counter, initialized to a random value
Since:
I understand that the lexicographical order over IDs is not useful: it does not help to know if a record has been recorded before another one (example: two records during the same second).
Question: For which reason would an "incremental counter" be useful in the context of an ID for which the previous bytes are non-incremental?
More precisely, why is: 5 random bytes + 3 incremental bytes more guaranteed to be unique than 8 random bytes / 64-bit UUID?
That is new spec. The original spec used timestamp + PID + counter. There may have also been a couple of bytes derived from something on the machine. This theoretically provided uniqueness by tagging each generated ID with an value unique to the instance creating it.
However, it was determined that when services are started automatically on system reboot that subsequent restarts would very often wind up with the same PID. With multiple identical systems, especially VMs, it was possible for several of them to have the same PID.
A random value does a better job of ensuring uniqueness. Having each instance select a random value only once, and using that value for the duration it is running reduces the number of chances for 2 instances to have the same random value.
The counter simply permits each instance to generate up to 2^24 unique values per second, with no chance of that value being repeated.
Starting the counter at a random value helps mitigate the not-quite-zero chance that 2 instances generated the same 5-byte random.