Choosing hash prefix for indexed value in GCP Datastore

Given a monotonically increasing property should be used as key or for exact match queries only, we plan to avoid index contention troubles by prepending a computed hash to the property value.

Practical example: importing data from RDBMS, where document id is sequential and should be used for lookup. So, we calculate hash of the id and store {hash}|{id}.

If this should work, what size of hash would you recommend? For example, if we take first 4 bytes of sha1, would this be good for effective index tablet splitting? Could not find info on this subject. Thank you in advance!

Solution

The size of the prefix is proportional to the max traffic you want to handle. So following the 500/50/5 rule, you'll want to have as many prefixes as traffic divided by 500. So if you are doing less than 500 writes/s then there is no need for a hash. If you are doing 1M writes/s with sequential ids, you'll want 2 base64 prefix digits.

You'll also want to ramp up your total traffic following the 500/50/5 rule.