Search code examples
google-bigqueryhbasegoogle-cloud-bigtable

BigTable design for nested (array) data


We are loading data in batch from BigQuery to BigTable. API's access the data via Cloud Functions in BigTable. The header data and the details are fetched in two different calls in the current architecture. However I wanted to store both header and details in one column family. So far, I didn't find any references or examples explaining how to store arrays in BigTable, so I guess it's not possible. Now what would be the best alternative design to be able to serve the API layer, for my case. A simplified example of the data, where call_replies is a STRUCT in BigQuery:

[
   {
      "call_id":"123",
      "caller":"Jeff",
      "call_creation_timestamp":"2020-01-01 19:20:35",
      "call_replies":[
         {
            "email":"[email protected]",
            "message":"Bladiebla"
         },
         {
            "email":"[email protected]",
            "message":"Jaryjary"
         }
      ]
   },
   {
      "call_id":"456",
      "caller":"John",
      "call_creation_timestamp":"2020-01-01 20:20:35",
      "call_replies":[
         
      ]
   }
]

Thanks in advance experts for your help!


Solution

  • You have a few options here.

    1. Versions

      Bigtable data supports versions, so you can have multiple cells at a row/column intersection. For the cell replies, you would just write each entry and make sure you have a garbage collection rule that allows for multiple versions. This will work great if you're storing the email and message in the same column, but can get a bit finicky if you're storing them separately since you would have to make sure the timestamp for each pair matches up.

    2. Bytes

      Everything in Bigtable is stored as bytes, so you can just encode the message when you write it and decode it upon reading it. This might cause a lot more overhead, but would allow you to store the data as you see fit.