I'm learning how to use HBase. I need to put in the database each trip of several cars (by points geolocated (x
,y
)). These data come in a JSON Format.
The problem is that the number of points geolocated during the trip change for each document that I recover. (Each trip is different.)
How can I store these data in HBase?
Do I have to change the number of columns for each row inserted?
Or Do I need to keep only 2 columns, one for all x
and one for all y
?
As I understand each trip is a time-series of (x,y) coordinates. I would suggest following design of schema:
Row key = shardKey + tripId + timestamp
, and each row has x
and y
columns.
Shard key can be (tripId % number of regions)
, which prevents hot spotting.
This will allow to retrieve data for each trip via single scan from one region.