Search code examples
database-schemainfluxdb

InfluxDB : single or multiple measurement


I'm a beginner with influxDB and after reading the Schema design documentation a question remain.

How to decide if you should use one measurement with multiple fields or multiple measurement with single field ?

I have multiple iot device which send every minute datas (temperature,humidity, pressure). All this datas have the exact same timestamp.

So i was wondering if d rather create one measurement like this :

    timestamp,iotid,temperature,humidity,pressure
-------------------------------------------------
    1501230195,iot1,70,         45,      850

Or 3 measurements (one for each value) , with the same tags but only one field in it ?

timestamp,iotid,temperature
----------------------------
    1501230195,iot1,70

timestamp,iotid,humidity
-------------------------
    1501230195,iot1,45

timestamp,iotid,pressure
-------------------------
    1501230195,iot1,850

Query-wise, i could retrieve only one value but also the 3 at the same time.


Solution

  • Bit of an old question but this is probably relevant to anyone working on TSDBs.

    When I first started, my appoach used to be that every data point went into a single field measurement. The assumption was that I'd combine the data I needed in a SQL statement at a later date. However, as anyone who's used a TSDB like influx knows that there are some serious limitations with one can do in the retrieval of data because of the design choices used in implementing a TSDB.

    As I've moved forward in my project, here are the rules of thumb I have developed:

    A measurement should contain all the dimensions required for it to make sense but no more.

    Example: imagine a gas flow meter which gives 3 signals:

    • volumetric flow
    • temperature
    • total flow

    In this scenario, volumetric flow and temperature should be two fields of a single measurement, and total flow should be its own measurement.

    (if the reader doesn't like this example, think of a home electric meter that outputs amps and volts, and kw and pf).

    Why would it be bad to store volumetric and temp in different series?

    1. Timing: if you store those two measurements in different series, they will have different index values (timestamp). Unless you take care to make sure they have explicitly specified timestamps, you run the risk of them being slightly offsampled. This can very well end up being a Bad Thing (tm) because you might be introducing a systematic measurement bias in your data. Even if it's not a bad thing, it's going to be super annoying if you ever want to reuse this data later on (e.g. to dump it in a csv file).

    2. Utility: if you want to deduce volumetric flow rate, you will have to get constant * temp * volume to get a correct value. Doing this with two separate measurements becomes a nightmare because, for instance, influxdb does not even support the operation. But even if it did, you'd have to make sure missing values of one of the fields aren't incorrectly handled and that grouping and aggregation is done right.

    Why would it be bad to store all three in a single measurement?

    You may very well have a use case in which you want to audit all three values at all times, but chances are this is not the case and you don't care about measuring total volume at the same kind of frequency that you'd like to measure flow itself.

    Putting all the fields in a single measurement will force you to either put nulls in certain fields, or to always log a variable that barely changes. Either way, it's not efficient.

    The important insight is that multi-dimensional entities require all their dimensions at the same time to make sense.