I have been wrangling with a time grouping issue in influxDB using the flux query language. I can illustrate with generated data, one entry per day from 2021-01-01 to 2021-01-05.
import "generate"
data = generate.from(
count: 5,
fn: (n) => n + 1,
start: 2021-01-01T00:00:00Z,
stop: 2021-01-06T00:00:00Z,
)
data
|> range(start: 2021-01-01T00:00:00Z, stop: 2021-01-05T05:00:00Z)
that generates:
_table | _value | _start | _stop | _time |
---|---|---|---|---|
0 | 1 | 2021-01-01T00:00:00.000Z | 2021-01-05T23:59:00.000Z | 2021-01-01T00:00:00.000Z |
0 | 2 | 2021-01-01T00:00:00.000Z | 2021-01-05T23:59:00.000Z | 2021-01-02T00:00:00.000Z |
0 | 3 | 2021-01-01T00:00:00.000Z | 2021-01-05T23:59:00.000Z | 2021-01-03T00:00:00.000Z |
0 | 4 | 2021-01-01T00:00:00.000Z | 2021-01-05T23:59:00.000Z | 2021-01-04T00:00:00.000Z |
0 | 5 | 2021-01-01T00:00:00.000Z | 2021-01-05T23:59:00.000Z | 2021-01-05T00:00:00.000Z |
now I query the data and aggregate per day
import "generate"
data = generate.from(
count: 5,
fn: (n) => n + 1,
start: 2021-01-01T00:00:00Z,
stop: 2021-01-06T00:00:00Z,
)
data
|> range(start: 2021-01-01T00:00:00Z, stop: 2021-01-05T23:59:00Z)
|> aggregateWindow(every: 1d, fn: sum, createEmpty: false)
I get this
_table | _value | _start | _stop | _time |
---|---|---|---|---|
0 | 1 | 2021-01-01T00:00:00.000Z | 2021-01-05T23:59:00.000Z | 2021-01-02T00:00:00.000Z |
0 | 2 | 2021-01-01T00:00:00.000Z | 2021-01-05T23:59:00.000Z | 2021-01-03T00:00:00.000Z |
0 | 3 | 2021-01-01T00:00:00.000Z | 2021-01-05T23:59:00.000Z | 2021-01-04T00:00:00.000Z |
0 | 4 | 2021-01-01T00:00:00.000Z | 2021-01-05T23:59:00.000Z | 2021-01-05T00:00:00.000Z |
0 | 5 | 2021-01-01T00:00:00.000Z | 2021-01-05T23:59:00.000Z | 2021-01-05T23:59:00.000Z |
the first time is 2021-01-02T00:00:00.000Z
and not 2021-01-01
and the two last entries cover the same day 2021-01-05
.
how can I get the entries per day as below using flux:
You simply need to change/add the timeSrc
parameter in your aggregateWindow
window function. By default, Flux will use _stop
as value of the parameter, creating "high" bound windows.
Let's make your example a bit simpler with the time not being exact at day end/start for better illustration: If for instance, it is 22:20 system time, I issue a query with range "-1h" and group by 1h, Flux will create two tumbling time windows: 21:00-22:00 and 22:00-23:00. In fact, Flux detects its range and truncates those global time windows to 21:20-22:00 and 22:00-22:20.
If now, aggregateWindow timeSrc is set to _stop, the time column will be 22:00 and 22:20. [the end of each previously mentioned time windows]. If you chose "_start", the time column will be 21:20 and 22:00.
If you want 21:00 and 22:00 in the above example, I suggest adjusting your range to something like: range(start: date.truncate(t: -24h, unit: 1h)
which usually also makes more sense that having a first bucket also contains all the data from that bucket.