Search code examples
google-cloud-platformgoogle-cloud-spanner

Schema layout regarding storage size of NULL values and join performance


We are exploring Cloud Spanner as a replacement for our Elasticsearch cluster and are currently layouting our database schema.

We are wondering if we should use one big table with lots of nullable columns, or if we should prefer a lean base table with lots of one-to-one child tables. We will have around 25 to 100 billion entries for the bigger tables (some tables probably even more), so massive joins/aggregations could be an issue.

  • do NULL values in a column require the same storage space as real values of the defined column type?

  • does Cloud Spanner have a problem with sparse-value documents like Elasticsearch?

  • how well does Cloud Spanner handle joins/aggregations over billion parent-child documents?


Solution

    • NULL values in Cloud Spanner are very cheap, essentially free.
    • The number of columns per table is limited to 1024. It is also recommended to put less than 4G of data per single parent key. See the full list of limitations here.
    • The parent-child join itself is efficient in that it executes as a single scan. However, it is hard to tell before knowing actual queries that you would like to execute.