Cloud Spanner with timestamp based history using hash

I would like to retain the full history of database writes on a specific table. The main usage of this table is to read the latest data, but the full auditability of all inserts and updates is also the business requirement. The official Spanner document here calls out on schema anti-patterns, and one of them is about monotonically increasing data being used as a primary key. It touches on changing the order of primary key can spread the load, and also suggests the use of hashing, shard with modulo, UUID, etc.

This Google Cloud blog post mentions the use of ShardId instead of a timestamp hash is preferred.

Note, however, that using a simple hash will make queries by timestamp range extremely slow, since retrieving a range of timestamps will require a full table scan to cover all the hashes. Instead, we recommend generating a ShardId from the timestamp.

The example table setup is provided, with query using the TimestampShardId.

TimestampShardId = CRC32(Timestamp) % 100

CREATE TABLE Events (
     TimestampShardId INT64 NOT NULL
     Timestamp TIMESTAMP NOT NULL,
     event_info...
) PRIMARY KEY (TimestampShardId, Timestamp DESC)

Select * from Events
WHERE
   TimestampShardId BETWEEN 0 AND 99
   AND Timestamp > @lower_bound
   AND Timestamp < @upper_bound;

I'm failing to understand how this TimestampShardId makes the scan faster than simple hashing. Both approaches seem to need to scan the entire table - could anyone walk me through why ShardId is preferred? For example, in order to get the full history, is having a history table with a hash of timestamp as primary key going to cause issues? What about primary keys with UUID and timestamp?

Solution

The idea is that Cloud Spanner can avoid a full sort of the Events table by performing a distributed union over each value of TimestampShardId and then reading keys in order for that shard.

Think of this as the complexity of merging N sorted lists compared to doing a full sort. If N is small, the merge will be relatively efficient. On the other hand, as N approaches the number of items in the list, the performance degrades to that of a full sort.

By using a different cardinality of TimestampShardId, you can trade off between write scalability and query performance -- more shards allow for more write concurrency, at the expense of more data to process in the merge step during the query. We recommend performance testing your specific workload with different numbers of shards to see what point in this space is optimal for you.