I'm designing a service where the data would be stored in Cloud Big Table. I am wondering which of the following (equivalent) data models would be more performant:
for example:
Option 1:
row-key "1.2.3.4", row-value: "mail.google.com"|empty|"ball.google.com"|"red.google.com"
Option 2:
row-key "1.2.3.4@<timestamp for uniqueness>", row-value: "mail.google.com"
row-key "1.2.3.4@<timestamp for uniqueness>", row-value: "ball.google.com"
row-key "1.2.3.4@<timestamp for uniqueness>", row-value: "red.google.com"
fetching under option 1 requires getting a single (big) line. fetching under option 2 requires getting multiple (short, and grouped) lines.
which is better, performance-wise? my guess is option 1, because there is no row-key overhead like in option 2, but I'd like to hear more answers.
(and, yes, I am aware of the row-size limit, and that's not a problem in my use-case).
If, as you said, row-size limit is not a problem in your use-case I would say that the option 1 is better.
This is, in fact, recommended by the official documentation for better performance in reads:
Limiting the number of rows that your nodes have to scan is the first step toward improving time to first byte and overall query latency.
The option 2 would introduce unnecessary rows to your query which will definately cause poorer performance in your queries.