Search code examples
google-cloud-platformgoogle-cloud-spanner

Cloud Spanner - Sharding based on String key


Our main goal is read performance as we typically do 80-90% reads to writes and I am trying to select the best primary key to achieve this goal.

The majority of our queries are based on customers e.g. get all products offered by customer X. Clients are identified by a three character string, similar to a ticker symbol.

My first thought is to create the primary key as follows:

CREATE TABLE Product (
    ProductId INT64 NOT NULL,
    CustomerCode STRING(3) NOT NULL,
    ...
) PRIMARY KEY (CustomerCode, ProductId);

My questions on this design are:

  • How will Spanner shard data based on a string value?

  • Will this approach create too many shards? We have about 100 key accounts and we are planning on using ~8 nodes


Solution

  • Spanner does automatic load-based sharding of row ranges based on the primary key.

    "Will this approach create too many shards?"

    Unlike many other distributed databases, this isn't something you need to worry about. Since Cloud Spanner does dynamic load based sharding, it will automatically adjust up and down the number of serving shards based upon the load you place on the system. Row ranges that get more load will split into more shards and row ranges that get less load will automatically consolidate to fewer shards.

    A good resource to read is Optimizing Schema Design for Cloud Spanner.