Search code examples
influxdb

Is InfluxDB series cardinality dependent on the number of field keys in a measurement


I'm trying to understand what's the right approach for calculating the series cardinality for a bucket as I'm seeing a mismatch between the value returned by influxdb.cardinality() method vs calculating based on the definition in the documentation.

According to the documentation (also in this video explain TSM engine), series cardinality is the number of unique measurement, tag set, and field key combinations in an InfluxDB bucket.

Assuming a measurement with just one tag and two field keys, and data as follows (line format)

m1,loc=abc speed=10,temp=20 m1,loc=xyz speed=10,temp=20

Based on the definition, I was expecting this bucket to have series cardinality of 4 (1 measurement * 2 unique-tag-values * 2 field-keys), and there are four series visibly (assuming each series has only one field).

However, when using the influxdb.cardinality() function, I get the series cardinality as 2 (1 measurement * 2 unique-tags). This gives an impression that a series can multiple fields.

import "influxdata/influxdb"
influxdb.cardinality(
  bucket: "test-cardinality",
  start: -4h
)

Solution

  • The short answer is no. That is, InfluxDB series cardinality DOESN'T depend on the number of field keys in a measurement.

    The influx_inspect's reporttsi will report "the total exact series cardinality in the database". Calculates the total exact series cardinality in the database.

    We can drill down the code layers by layers. From the reporttsi code here, we can see the system is iterating through all shards and then iterating through all measurements and calculate their series ids. The series id is defined in the code here. We can see that it's only related to measurement and tags.

    // CreateSeriesListIfNotExists creates a list of series in bulk if they don't exist.
    // The returned ids slice returns IDs for every name+tags, creating new 
        series IDs as needed.
    func (f *SeriesFile) CreateSeriesListIfNotExists(names [][]byte, tagsSlice 
        []models.Tags) ([]uint64, error) {
    

    Field keys or values are not considered during the series cardinality calculation.