Search code examples
hbasetime-seriesopentsdb

Clarification regarding OpenTSDB's data model


I'm working with OpenTSDB for a school project. In the project I am supposed to create a structure for storing time series data from robots. The data is collected at a rate of 5 times per second, per robot and there can be up to a 100 active robots.

I've managed to set up OpenTSDB and link it to a HBase cluster. However after reading the documentation on openTSDB's website I feel I have no clear view of the data model. The website says that every time series data point requires the following data:

  • metric
  • timestamp
  • value
  • tag(s) - Key/value

This brings me to my first question. What is the value? Why is it needed?

Going back to the robots; each robot is identified by two ID's and each measurement consist of a total of 9 values. Hence each measurement is associated with a total of 11 values/tags. Putting this together with the need for a metric, value and a UnixTimestamp it amounts to a total of 14 tags in OpenTSDB. The default setup of OpenTSDB doesn't support that many values. I know that the number of tags can be increased by changing OpenTSDB's configuration file, but the internet has also told me that increasing the number of tags can dramatically slow down queries.

Any suggestions on how I should tackle this? Should I just increase the number of tags? Or is there another way to solve this?

Note: All the values associated with a measurement will always be accessed and plotted together.


Solution

  • So I just realized OpenTSDB can only plot one value per metric - and that is the purpose of the value...stupid me :)

    My original thought was that the tags (key/value) could be used as values to plot. But the tags only provide search criteria for the actual value. So if you have a similar structure to me you would have to store the values in separate metrics and associate the value with the robot. In JSon it would look something like this:

    {"metric": "value1", "timestamp": 1429542213, "value": 10, "tags":            
    {"robotName": 1}}
    {"metric": "value2", "timestamp": 1429542213, "value": 20, "tags":            
    {"robotName": 1}}
    // value 3, 4, 5, 6, 7, 8, 9
    

    This means that for each one of my robots measurements I need to store 9 different time series. Since each robot sends data 5 times per second this amounts to a total of 45 time series per second, per robot. If I wish to have 100 active robots this will amount to 4500 metrics per second.