google-bigquery hbase google-cloud-bigtable

BigTable design for nested (array) data

We are loading data in batch from BigQuery to BigTable. API's access the data via Cloud Functions in BigTable. The header data and the details are fetched in two different calls in the current architecture. However I wanted to store both header and details in one column family. So far, I didn't find any references or examples explaining how to store arrays in BigTable, so I guess it's not possible. Now what would be the best alternative design to be able to serve the API layer, for my case. A simplified example of the data, where call_replies is a STRUCT in BigQuery:

[
   {
      "call_id":"123",
      "caller":"Jeff",
      "call_creation_timestamp":"2020-01-01 19:20:35",
      "call_replies":[
         {
            "email":"Bladiebla@gmail.com",
            "message":"Bladiebla"
         },
         {
            "email":"jaryjary@gmail.com",
            "message":"Jaryjary"
         }
      ]
   },
   {
      "call_id":"456",
      "caller":"John",
      "call_creation_timestamp":"2020-01-01 20:20:35",
      "call_replies":[
         
      ]
   }
]

Thanks in advance experts for your help!

Solution

You have a few options here.

Versions

Bigtable data supports versions, so you can have multiple cells at a row/column intersection. For the cell replies, you would just write each entry and make sure you have a garbage collection rule that allows for multiple versions. This will work great if you're storing the email and message in the same column, but can get a bit finicky if you're storing them separately since you would have to make sure the timestamp for each pair matches up.
Bytes

Everything in Bigtable is stored as bytes, so you can just encode the message when you write it and decode it upon reading it. This might cause a lot more overhead, but would allow you to store the data as you see fit.