Search code examples
google-cloud-platformgoogle-cloud-spanner

Cloud Spanner: Implication of split being 'too big'


The documentation states that one split should not be bigger then 'a few GB'.

  • Is there a hard limit on that where Cloud Spanner will stop storing more data in one split ?
  • What is the implication of e.g. splits growing to 20-30GB ?
    • I can think of problems when those splits need to be moved around between instances while being read/written

I know the second point sound like we should split up our primary key/add a sharding-key as first primary-key-part.

But if you have hundreds of customers having really big product catalogs and you need to interleave brand- and category-tables so you can join on them. And alternative approaches of storing one product-catalog in several splits become very slow on secondary index queries (like: query all active products in a catalog).

Thanks a lot in advance because this would help us a lot of understanding Cloud Spanner better for our planned production-use. Christian Gintenreiter


Solution

  • A split can only be served by a single node, so very large splits may cause the single node to become a performance bottleneck. You may start to see performance degradation with a split size greater than 2GB. The hard limit on split size is bound by the the storage limit for a single node, which is 2TB.

    Can you please provide some more details about your schema and interleaving?