Search code examples
influxdb

InfluxDB syntactic difference


Is there any difference between this two forms?

myMetric value1=1,value2=2

and this

myMetric.value1 v=1
myMetric.value2 v=2

Both store the same data (two points). Obviously, they are accessible in different way, but I mean is there any difference in storage, performance etc? As per this talk, the first one gets converted to the second one, at least semantically.


Solution

  • According to Influx docs for influx line protocol: <measurement>[,<tag_key>=<tag_value>[,<tag_key>=<tag_value>]] <field_key>=<field_value>[,<field_key>=<field_value>] [<timestamp>] You first form inserts one record into measurement myMetric without tags, two fields (value1,value2) having values 1 and 2 respectively. Since there is not timestamp supplied in data server timestamp will be used for data point.

    In second case you are creating two separate measurements: myMetric.value1 and myMetric.value2 each having one field named v with values 1 and 2 respectively. Timestamps for them are likely to be different too, taking into account default nanosecond precision.

    So, these two cases are not equivalent.

    Using influx cli tool these cases look like:

    > INSERT myMetric value1=1,value2=2
    > show measurements
    name: measurements
    name
    ----
    myMetric
    > show field keys from myMetric
    name: myMetric
    fieldKey fieldType
    -------- ---------
    value1   float
    value2   float
    > select * from myMetric
    name: myMetric
    time                value1 value2
    ----                ------ ------
    1526032578114702408 1      2
    

    For the second case:

    > INSERT myMetric.value1 v=1
    > INSERT myMetric.value2 v=2
    > show measurements
    name: measurements
    name
    ----
    myMetric.value1
    myMetric.value2
    > select * from "myMetric.value1"
    name: myMetric.value1
    time                v
    ----                -
    1526032859752277164 1
    > select * from "myMetric.value2"
    name: myMetric.value2
    time                v
    ----                -
    1526032864711858673 2
    

    As you see in case 1 you have 1 insert operation into one measurement for one datapoint with two fields in it. In case 2 there are two insert operations into two distinct measurements having one field each.

    Thus if in your use case value1 and value2 are usually inserted together I would expect first variant to be more performant. Case 2 will require 2 inserts for same data. Storage usage is likely to be approximately the same.

    If value1 and value2 are inserted independently and at different times case 2 can be a bit more efficient in terms of storage as it will not have to store nulls for datapoints like (null,2) or (1,null).

    Also having data fields in separate measurements has another drawback: queries like:

    > select value1, value2,value2-value1 from myMetric
    name: myMetric
    time                value1 value2 value2_value1
    ----                ------ ------ -------------
    1526032578114702408 1      2      1
    

    won't be possible in second case.