Amazon's DynamoDB documentation appears to be deliberately cagey about how a partition is selected for a row. Here is the discussion about the Partition Key (emphasis mine):
Partition key – A simple primary key, composed of one attribute known as the partition key.
DynamoDB uses the partition key's value as input to an internal hash function. The output from the hash function determines the partition (physical storage internal to DynamoDB) in which the item will be stored.
In a table that has only a partition key, no two items can have the same partition key value.
The
People
table described in Tables, Items, and Attributes is an example of a table with a simple primary key (PersonID
). You can access any item in thePeople
table immediately by providing thePersonId
value for that item.
So the example given has PersonID as a number, which can be grand or dismal for hashing - depending on that internal hash function.
On my project, we're using a random v4 UUID for our primary key, and currently we're persisting that UUID in String/S
form (with the dashes included). It occurs to me that, similar to integer, this UUID string can hash beautifully or dismally depending on that internal hash function.
Persisting UUIDs as strings is convenient for us (albeit wasteful space-wise) because we can view/query the UUIDs in the Dynamo console in the same v4 format as appears in our application's logs. BUT, if persisting our UUIDs in String/S
form rather than Binary/B
form is going to lead to horrible aliasing of our rows to just one or two partitions because the internal hash function is naive about converting our UUID string to bytes, then convenience be damned and Binary/B
form is best for UUID.
So, I would like to know more about the internal hash function (from the Dynamo developers themselves, preferably.) Pray give us details as to the level of smarts in that internal hash function. How does it behave with String/S
, Number/N
, and Binary/B
types?
Does the internal hash function recognize we're passing a v4 UUID formatted string and automagically hash on the binary form of that UUID? Or, is it lexicographically hashing?
If the String/S
key hashing algorithm is naive by default, is there any programmatic way I can use to hint to Dynamo that my String key is a UUID and have it hash on the binary form as such? I'm using the DynamoSDK for Java with the DynamoDBMapper to access my tables and I can sprinkle additional annotations on my entities wherever you direct. I control my own table definition as well via DynamoDB schema json configurations and can make changes there as needed.
I am not a developer on the DynamoDB team here, but I'll still try to answer the best I can.