Search code examples
influxdbinfluxdb-2flux-influxdb

Take the median of a grouped set


I am quite new to Flux and want to solve an issue:

I got a bucket containing measurements, which are generated by a worker-service.

Each measurement belongs to a site and has an identifier (uuid). Each measurement contains three measurement points containing a value.

What I want to archive now is the following: Create a graph/list/table of measurements for a specific site and aggregate the median value of each of the three measurement points per measurement.

TLDR;

  • Get all measurementpoints that belong to the specific site-uuid
  • As each measurement has an uuid and contains three measurement points, group by measurement and take the median for each measurement
  • Return a result that only contains the median value for each measurement

This does not work:

from(bucket: "test")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "lighthouse")
  |> filter(fn: (r) => r["_field"] == "speedindex")
  |> filter(fn: (r) => r["site"] == "1d1a13a3-bb07-3447-a3b7-d8ffcae74045")
  |> group(columns: ["measurement"])
  |> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: false)
  |> yield(name: "mean")

result

This does not throw an error, but it of course does not take the median of the specific groups.

This is the result (simple table):


Solution

  • If I understand your question correctly you want a single number to be returned. In that case you'll want to use the |> mean() function:

    from(bucket: "test")
      |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
      |> filter(fn: (r) => r["_measurement"] == "lighthouse")
      |> filter(fn: (r) => r["_field"] == "speedindex")
      |> filter(fn: (r) => r["site"] == "1d1a13a3-bb07-3447-a3b7-d8ffcae74045")
      |> group(columns: ["measurement"])
      |> mean()
      |> yield(name: "mean")
    

    The aggregateWindow function aggregates your values over (multiple) windows of time. The script you posted computes the mean over each v.windowPeriod (in this case 20 minutes).

    I am not entirely sure what v.windowPeriod represents, but I usually use time literals for all times (including start and stop), I find it easier to understand how the query relates to the result that way.

    On a side note: the yield function only renames your result and allows you to have multiple returning queries, it does not compute anything.