I'm using Impala. My table has separate columns for the date (in the format YYYYMMDD as STRING), hour and minutes (as INT) For example, I get:
date_part hour_part minute_part kpi
20200828 10 10 23
20200828 10 15 64
20200828 10 20 18
To combine these columns, I have used the following syntax:
MINUTES_ADD(HOURS_ADD(TO_DATE(FROM_UNIXTIME(UNIX_TIMESTAMP(CAST(date_part AS STRING), "yyyyMMdd"))), hour_part), minute_part) AS date_time
Therefore I get:
date_time kpi
2020-08-28 10:10:00 23
2020-08-28 10:15:00 64
2020-08-28 10:20:00 18
(please note that my data only is populated each 5 minutes bin)
Now, if I want to calculate a AVG(kpi) over 1 hour, I can do the following GROUP BY:
SELECT
MINUTES_ADD(HOURS_ADD(TO_DATE(FROM_UNIXTIME(UNIX_TIMESTAMP(CAST(date_part AS STRING), "yyyyMMdd"))), hour_part), minute_part) AS date_time,
AVG(kpi)
FROM table
GROUP BY
DATE_PART('year', date_time),
DATE_PART('month', date_time),
DATE_PART('day', date_time),
DATE_PART('hour', date_time)
However, how can I get the query to give an aggregation over 15 minutes?
You can use arithmetic on minutes:
SELECT MINUTES_ADD(HOURS_ADD(TO_DATE(FROM_UNIXTIME(UNIX_TIMESTAMP(CAST(date_part AS STRING), "yyyyMMdd"))), hour_part), FLOOR(minute_part / 15)) AS date_time,
AVG(kpi)
FROM table
GROUP BY date_part, hour_part,
FLOOR(minute_part / 15)