Search code examples
time-seriesaggregate-functionstimescaledbhypertablecontinuous-aggregates

TimescaleDB time_bucket() function giving unexpected results for arbitrary time interval


I have created a hypertable water_meter to store the sensor data

It contains following data ordered by timestamp in ascending order

select * from water_meter order by time_stamp;

click here to see the results of the above query

As can be seen I have data starting from 01 May 2020

if I use time_bucket() function to get aggregates per 1 day as:

SELECT
time_bucket('1 days', time_stamp) as bucket,
thing_key,
avg(pulsel) as avg_pulse_l,
avg(pulseh) as avg_pulse_h
FROM
water_meter
GROUP BY thing_key, bucket;

It works fine and I get below data:

see the results of time_bucket('1 day')

Now if I use it to get 15 days aggregates, I get unexpected results where the starting time bucket is shown for 17 April 2020, for which there was no data in the table

SELECT
time_bucket('15 days', time_stamp) as bucket,
thing_key,
avg(pulsel) as avg_pulse_l,
avg(pulseh) as avg_pulse_h
FROM
water_meter
GROUP BY thing_key, bucket;

click to see the results for time_bucket('15 days')


Solution

  • The time_bucket function buckets things into buckets which have an implied range, ie a 15 minute bucket might appear as '2021-01-01 01:15:00.000+00' or something, but it would contain timestamps in the range ['2021-01-01 01:15:00', '2021-01-01 01:30:00') - inclusive on the left exclusive on the right. The same thing happens for days. The bucket is determined and happens to start on the 17th of April, but will include the data in the range: ["2020-04-17 00:00:00+00","2020-05-02 00:00:00+00"). You can use the experimental function in the TimescaleDB Toolkit extension to get these ranges: SELECT toolkit_experimental.time_bucket_range('15 days'::interval, '2020-05-01');

    You can also use the offset or origin parameters of the time_bucket function to modify the start: select time_bucket('15 days'::interval, '2020-05-01', origin=>'2020-05-01');