Search code examples
bigtablegoogle-cloud-bigtable

Bigtable rowkey design for real-time sensor data?


Your company is streaming real-time sensor data from their factory floor into Bigtable and they have noticed extremely poor performance. How should the row key be redesigned to improve Bigtable performance on queries that populate real-time dashboards?

a) Use a row key of the form <timestamp>
b) Use a row key of the form <sensorid>
c) Use a row key of the form <timestamp>#<sensorid>
d) Use a row key of the form >#<sensorid>#<timestamp>

Based on the documentation, what would be the ideal row key on this case? I think it should be a row key of sensorid and timestamp, but i have seen some online article mentioning just the 'timestamp' for the above homework question. Please help.

I have conflicting theories on the above specific usecase as below: - Since rows are sorted lexicographical, it is not just wise to just use the timestamp as row-key. (From Doc - Using the timestamp by itself as the row key is not recommended, as most writes would be pushed onto a single node.) - On this usecase, since the requirement is a real-time dashboard, it could also mean that the all sensorid data can be stored just for one timestamp, so real-time querying can be done based on just the timestamp.

Please help with the ideal row-key on this usecase.


Solution

  • The problem is, it does not specify what query the real-time dashboard shows also not much insight on the performance. Please refer to the schema design for time series data documentation which has some example scenarios. If you have only timestamp as key, you may suffer from hotspotting. The ideal key will be ## (Option D) but it always depends on the use case which is not very clear in the question.