Search code examples
gobigtablegoogle-cloud-bigtable

BigTable: One large query or a dozen small queries?


I store series of events in BigTable with the form of:

rowKey                | col_1 | col_2
----------------------|-------|------
uuid1!uuid2!timestamp | val1  | val2
....

col_1 holds a float64 and col_2 holds a string 63 characters long.

Specific ranges within this series of events are grouped and are loosely associated to an object we'll call an operation:

{
    "id": 123,
    "startDate": "2019-07-15T14:02:12.335+02:00",
    "endDate": "2019-07-15T14:02:16.335+02:00"
}

So you may say that an operation is a timewindow of events, and may be associated to 10-1000 events.

When I want to display this data to the user, I first query the operation objects, and then I execute a BigTable query for each operation to find the events it covers.

Through monitoring I've discovered that each BigTable (a development instance, mind you) query may take between 20ms to 300ms.

This got me wondering, given BigTable's architecture - does it make sense to execute small, individual queries?

Does it make more sense to execute one big query that covers my range of operations, then divide the events to their respective operations in my application?


Solution

  • Most likely yes, but the details matter here.

    If there are only a few operations per user request then it may actually be better to issue the small queries in parallel. This will get you the best possible latency per request, at the expense of some per-request CPU overhead for your cluster. Your application code will also be more complicated.

    If there are lots of operations per user request, you'll definitely want the increased throughput efficiency that you get from scanning.

    For an advanced use case you could also compromise between the two and break the scan into N shards which you run in parallel, where N << #operations.

    The one thing you definitely shouldn't do is send the small requests one at a time, as you'll just produce a bunch of unnecessary round trips!