Search code examples
time-seriesinfluxdb

Grouping influx data per day using flux


I have been wrangling with a time grouping issue in influxDB using the flux query language. I can illustrate with generated data, one entry per day from 2021-01-01 to 2021-01-05.

import "generate"
data = generate.from(
  count: 5,
  fn: (n) => n + 1,
  start: 2021-01-01T00:00:00Z,
  stop: 2021-01-06T00:00:00Z,
)

data
  |> range(start: 2021-01-01T00:00:00Z, stop: 2021-01-05T05:00:00Z) 

that generates:

_table _value _start _stop _time
0 1 2021-01-01T00:00:00.000Z 2021-01-05T23:59:00.000Z 2021-01-01T00:00:00.000Z
0 2 2021-01-01T00:00:00.000Z 2021-01-05T23:59:00.000Z 2021-01-02T00:00:00.000Z
0 3 2021-01-01T00:00:00.000Z 2021-01-05T23:59:00.000Z 2021-01-03T00:00:00.000Z
0 4 2021-01-01T00:00:00.000Z 2021-01-05T23:59:00.000Z 2021-01-04T00:00:00.000Z
0 5 2021-01-01T00:00:00.000Z 2021-01-05T23:59:00.000Z 2021-01-05T00:00:00.000Z

now I query the data and aggregate per day

import "generate"
data = generate.from(
count: 5,
  fn: (n) => n + 1,
  start: 2021-01-01T00:00:00Z,
  stop: 2021-01-06T00:00:00Z,
)
data
  |> range(start: 2021-01-01T00:00:00Z, stop: 2021-01-05T23:59:00Z)
  |> aggregateWindow(every: 1d, fn: sum, createEmpty: false)

I get this

_table _value _start _stop _time
0 1 2021-01-01T00:00:00.000Z 2021-01-05T23:59:00.000Z 2021-01-02T00:00:00.000Z
0 2 2021-01-01T00:00:00.000Z 2021-01-05T23:59:00.000Z 2021-01-03T00:00:00.000Z
0 3 2021-01-01T00:00:00.000Z 2021-01-05T23:59:00.000Z 2021-01-04T00:00:00.000Z
0 4 2021-01-01T00:00:00.000Z 2021-01-05T23:59:00.000Z 2021-01-05T00:00:00.000Z
0 5 2021-01-01T00:00:00.000Z 2021-01-05T23:59:00.000Z 2021-01-05T23:59:00.000Z

the first time is 2021-01-02T00:00:00.000Z and not 2021-01-01 and the two last entries cover the same day 2021-01-05.

how can I get the entries per day as below using flux:

  • 2021-01-01 - 1
  • 2021-01-02 - 2
  • 2021-01-03 - 3
  • 2021-01-04 - 4
  • 2021-01-05 - 5

Solution

  • You simply need to change/add the timeSrc parameter in your aggregateWindow window function. By default, Flux will use _stop as value of the parameter, creating "high" bound windows.

    Let's make your example a bit simpler with the time not being exact at day end/start for better illustration: If for instance, it is 22:20 system time, I issue a query with range "-1h" and group by 1h, Flux will create two tumbling time windows: 21:00-22:00 and 22:00-23:00. In fact, Flux detects its range and truncates those global time windows to 21:20-22:00 and 22:00-22:20.

    If now, aggregateWindow timeSrc is set to _stop, the time column will be 22:00 and 22:20. [the end of each previously mentioned time windows]. If you chose "_start", the time column will be 21:20 and 22:00.

    If you want 21:00 and 22:00 in the above example, I suggest adjusting your range to something like: range(start: date.truncate(t: -24h, unit: 1h) which usually also makes more sense that having a first bucket also contains all the data from that bucket.