Search code examples
google-cloud-bigtablebigtable

BigTable: One long row vs many short ones


I'm designing a service where the data would be stored in Cloud Big Table. I am wondering which of the following (equivalent) data models would be more performant:

  1. a single row with many sparse columns
  2. many short rows with a single column. In this option, the row-keys would be grouped together, so fetching them should be easy.

for example:

  • the service I'm working on would map IPs to subdomains

Option 1:

row-key "1.2.3.4", row-value: "mail.google.com"|empty|"ball.google.com"|"red.google.com"

Option 2:

row-key "1.2.3.4@<timestamp for uniqueness>", row-value: "mail.google.com"
row-key "1.2.3.4@<timestamp for uniqueness>", row-value: "ball.google.com"
row-key "1.2.3.4@<timestamp for uniqueness>", row-value: "red.google.com"

fetching under option 1 requires getting a single (big) line. fetching under option 2 requires getting multiple (short, and grouped) lines.

which is better, performance-wise? my guess is option 1, because there is no row-key overhead like in option 2, but I'd like to hear more answers.

(and, yes, I am aware of the row-size limit, and that's not a problem in my use-case).


Solution

  • If, as you said, row-size limit is not a problem in your use-case I would say that the option 1 is better.

    This is, in fact, recommended by the official documentation for better performance in reads:

    Limiting the number of rows that your nodes have to scan is the first step toward improving time to first byte and overall query latency.

    The option 2 would introduce unnecessary rows to your query which will definately cause poorer performance in your queries.