Search code examples
influxdbinfluxdb-2

Influx DB aggregate grouped data


I am very new to Influx DB, the flux query language, and time-series databases in general. I have a bucket filled with firewall log entries, and I'd like to count the number of entries grouped by source IP address. In SQL I'd do something like this:

select 
    _value, 
    count(*) as count
from my_bucket 
where _field = 'ufw_src_ip' 
group by _value;

After many hours of banging my head against the wall, I have come up with this:

from(bucket: "my_bucket")
  |> range(start: -30d)
  |> filter(fn: (r) => r["_field"] == "ufw_src_ip")
  |> group(columns: ["_value"])
  |> count()

but to no avail, it gives me

cannot aggregate columns that are part of the group key

So how do I actually translate this logic to its flux equivalent? Is it even possible to do it?


Solution

  • After some more trial and error I managed to get it working:

    from(bucket: "my_bucket")
      |> range(start: -30d)
      |> filter(fn: (r) => r["_field"] == "ufw_src_ip")
      |> group(columns: ["_value"])
      |> count(column: "_field")
      |> map(fn: (r) => ({
        "Source IP": r._value,
        "Count": r._field
      }))
      |> sort(columns: ["Count"], desc: true)
    

    The trick was to count the values from a column that is not part of the group key, so in this case counting the _field name column worked (whereas not explicitly specifying a column would cause it to count the _value column which is part of the group key).

    I am still not entirely sure why this restriction exists, but at this point I'm just happy it's working :P

    Leaving this here for anybody coming from an SQL-background who might be running into a similar issue.