Search code examples
google-cloud-bigtablebigtable

Bigtable schema - multiple columns or rows?


I'm designing a Bigtable schema, where I am trying to optimize read performance. I'm looking for some advice on which of these 2 options would perform better:

  1. Single row, with multiple columns (approximately 1-200 columns per row, with most being less than 10). The only data in each cell would be a timestamp.

  2. Multiple rows for each record, with fields appended on to the row key, and just a single column for the timestamp.

I've seen some documentation recommend narrow and tall schemas, which would suggest #2. But that would require reading a range of keys to get the data back, which I presume would be slower than just reading a single row as in option 1?


Solution

  • I think it doesn't matter which way you do this since there would be the same amount of data stored adjacently either way. I think the single row might use less data since you wouldn't have to duplicate information about the row.

    Also, since you only need timestamp data, you can use the timestamp part of the cell and make the value something that is just one byte, so you can optimize storage in that way.

    Either way, it will probably be fairly negligible, but if you're concerned with latency with a few milliseconds, I'd recommend setting up a bit of data using both schemas and generating a bunch of reads on both to see if one is slightly more performant.