Grouping influx data per day using flux

I have been wrangling with a time grouping issue in influxDB using the flux query language. I can illustrate with generated data, one entry per day from 2021-01-01 to 2021-01-05.

import "generate"
data = generate.from(
  count: 5,
  fn: (n) => n + 1,
  start: 2021-01-01T00:00:00Z,
  stop: 2021-01-06T00:00:00Z,
)

data
  |> range(start: 2021-01-01T00:00:00Z, stop: 2021-01-05T05:00:00Z)

that generates:

_value	_start	_stop	_time
1	2021-01-01T00:00:00.000Z	2021-01-05T23:59:00.000Z	2021-01-01T00:00:00.000Z
2	2021-01-01T00:00:00.000Z	2021-01-05T23:59:00.000Z	2021-01-02T00:00:00.000Z
3	2021-01-01T00:00:00.000Z	2021-01-05T23:59:00.000Z	2021-01-03T00:00:00.000Z
4	2021-01-01T00:00:00.000Z	2021-01-05T23:59:00.000Z	2021-01-04T00:00:00.000Z
5	2021-01-01T00:00:00.000Z	2021-01-05T23:59:00.000Z	2021-01-05T00:00:00.000Z

now I query the data and aggregate per day

import "generate"
data = generate.from(
count: 5,
  fn: (n) => n + 1,
  start: 2021-01-01T00:00:00Z,
  stop: 2021-01-06T00:00:00Z,
)
data
  |> range(start: 2021-01-01T00:00:00Z, stop: 2021-01-05T23:59:00Z)
  |> aggregateWindow(every: 1d, fn: sum, createEmpty: false)

I get this

_value	_start	_stop	_time
1	2021-01-01T00:00:00.000Z	2021-01-05T23:59:00.000Z	2021-01-02T00:00:00.000Z
2	2021-01-01T00:00:00.000Z	2021-01-05T23:59:00.000Z	2021-01-03T00:00:00.000Z
3	2021-01-01T00:00:00.000Z	2021-01-05T23:59:00.000Z	2021-01-04T00:00:00.000Z
4	2021-01-01T00:00:00.000Z	2021-01-05T23:59:00.000Z	2021-01-05T00:00:00.000Z
5	2021-01-01T00:00:00.000Z	2021-01-05T23:59:00.000Z	2021-01-05T23:59:00.000Z

the first time is 2021-01-02T00:00:00.000Z and not 2021-01-01 and the two last entries cover the same day 2021-01-05.

how can I get the entries per day as below using flux:

2021-01-01 - 1
2021-01-02 - 2
2021-01-03 - 3
2021-01-04 - 4
2021-01-05 - 5

Solution

You simply need to change/add the timeSrc parameter in your aggregateWindow window function. By default, Flux will use _stop as value of the parameter, creating "high" bound windows.

Let's make your example a bit simpler with the time not being exact at day end/start for better illustration: If for instance, it is 22:20 system time, I issue a query with range "-1h" and group by 1h, Flux will create two tumbling time windows: 21:00-22:00 and 22:00-23:00. In fact, Flux detects its range and truncates those global time windows to 21:20-22:00 and 22:00-22:20.

If now, aggregateWindow timeSrc is set to _stop, the time column will be 22:00 and 22:20. [the end of each previously mentioned time windows]. If you chose "_start", the time column will be 21:20 and 22:00.

If you want 21:00 and 22:00 in the above example, I suggest adjusting your range to something like: range(start: date.truncate(t: -24h, unit: 1h) which usually also makes more sense that having a first bucket also contains all the data from that bucket.