Search code examples
google-cloud-datastore

Choosing hash prefix for indexed value in GCP Datastore


Given a monotonically increasing property should be used as key or for exact match queries only, we plan to avoid index contention troubles by prepending a computed hash to the property value.

Practical example: importing data from RDBMS, where document id is sequential and should be used for lookup. So, we calculate hash of the id and store {hash}|{id}.

If this should work, what size of hash would you recommend? For example, if we take first 4 bytes of sha1, would this be good for effective index tablet splitting? Could not find info on this subject. Thank you in advance!


Solution

  • The size of the prefix is proportional to the max traffic you want to handle. So following the 500/50/5 rule, you'll want to have as many prefixes as traffic divided by 500. So if you are doing less than 500 writes/s then there is no need for a hash. If you are doing 1M writes/s with sequential ids, you'll want 2 base64 prefix digits.

    You'll also want to ramp up your total traffic following the 500/50/5 rule.