I'm using Google BigTable to store event log data according to the following constraints:
Obviously, I'd like to avoid doing extra steps whenever I read or write, so I was thinking of encrypting usernames using SHA1 before adding them as a key in BigTable. As a result, all keys in BigTable will now be formatted like this:
cf23df2207d99a74fbe169e3eba035e633b65d94_2018_01_30_15090001
We know that SHA1 is normally distributed, so given that, is it safe to assume that all of my records will be evenly distributed across nodes, while ensuring that all usernames will reside together? Will this in effect prevent hotspotting? Are there any edge cases in this approach that I've missed?
Assuming that User Id is well distributed (i.e. there isn't a user that will have more than 10K operations per second), this approach should be fine.
FYI, Cloud Bigtable measures operations in rows per second, and you want to consider your peak throughput in determining the number of nodes. Each node can support 10,000 simple reads or writes per second. Our smallest production configuration is 3 nodes, which can support up to 30,000 rows per second (2.6 Billion rows per day if used continuously at the maximum).