performance database-schema influxdb collectd

schema design for multiple measurements

I am working with collectd and influxdb.

collectd v5.5 allows memory and cpu values to be reported in percentages. However, all the percentage values are being written into a measurement called "percent_values". The measurements have the tag type_instance and it has tag values from memory (used, cache...) and tags from cpu (idle, user, irq).

====================================================================== Measurement: percent_value

Tags: type_instance=[cache, free, used, buffered, idle, nice, softirq, steal, system, user...]

Fields: value ======================================================================

Why are all the values all written into a single measurement instead of having 2 separate measurements? Would the following make more sense or have better performance?

======================================================================

Measurement: mem_percent_value

Tags: type_instance=[cache, free, used, buffered]

Fields: value

Measurement: cpu_percent_value

Tags: type_instance=[idle, nice, softirq, steal, system, user...]

Fields: value ======================================================================

In terms of schema design which have good performance, is it better to have a single measurement with many tag values or multiple measurements with only tag values belonging to the measurement. I will be designing some new measurements, should I also store them in a single measurement with all the tag values, or separate them?

Solution

tl;dr use fewer measurements with tags for things you need indexed, and fields for everything else. And tags must be strings.

I would recommend first reading about series cardinality https://docs.influxdata.com/influxdb/v1.2/concepts/glossary/#series-cardinality

Unlike constraints imposed by Graphite which require you to have a large number of measurements, the Influx recommendation is to favor a smaller number of measurements. Then, use tags for attributes that will help you write more performant queries. In regards to performance: given a constant number of indexable "attributes", it should not matter if you have many measurements with no tags, or few measurements with many tags. In other words, you should end up with the same cardinality in either case.

This is a helpful section as well: https://docs.influxdata.com/influxdb/v1.2/concepts/schema_and_data_layout/#discouraged-schema-design

I'll give you a concrete example I dealt with regarding JVM Memory.

Option 1: Many measurements
- jvm.memory.heap.used [jmxport=1099, ...] value=1024
Option 2: Fewer measurements
- jvm.memory [metric_type=heap, jmxport=1099, ...] value=1024

Here is the same kind of example, just a bit more comprehensive https://github.com/jmxtrans/jmxtrans/wiki/StatsDTelegrafWriter#schema-design