Search code examples
sqldatabasepartitioningimpalaqsqlquery

AVG over time Window in Impala ... OVER (PARTITION BY ... ORDER BY)


I have a Table in Impala in which I have time information as UnixTime with a frequency of 1mSec. I am trying to get the AVG(), MIN() and MAX() for a window of 10Sec (But I do not want to fix it and can be 20sec, 30sec, etc).

I am doing it using sub-queries but I am not getting the right answer. The following is my Data in a Table: Data in the Table

I am using the following sub-query to get the AVG(), MIN() and MAX() for a window of 10 seconds. I am using OVER (PARTITION BY ... ORDER BY) but not getting the correct results. My query is as follow:

SELECT DISTINCT *
FROM
(SELECT ts,
last_value(Table1.val1) OVER (PARTITION BY Table1.ts ORDER BY Table1.ts rows between unbounded preceding and unbounded following) as val1,
AVG(Table1.val2) OVER (PARTITION BY Table1.ts ORDER BY Table1.ts rows between unbounded preceding and unbounded following) as val2
MIN(Table1.val3) OVER (PARTITION BY Table1.ts ORDER BY Table1.ts rows between unbounded preceding and unbounded following) as val2
MAX(Table1.val4) OVER (PARTITION BY Table1.ts ORDER BY Table1.ts rows between unbounded preceding and unbounded following) as val2
FROM (SELECT cast(cast(unix_timestamp(cast(ts/1000 as TIMESTAMP))/10 as bigint)*10 as TIMESTAMP) as ts , 
val1 as val1, 
val2 as val2, 
val3 as val3, 
val4 as val4 
FROM Sensor_Data.Table where unit='Unit1'  
and cast(ts/1000 as TIMESTAMP) BETWEEN '2020-11-29 22:30:00' and '2020-12-01 01:51:00') as Table1) as Table2
ORDER BY ts

I need the following answer:

Time                    Val1        Val2        Val3        Val4
2020-11-29 22:30:00     last_value  AVG         MIN         MAX
2020-11-29 22:30:10     last_value  AVG         MIN         MAX
2020-11-29 22:30:20     last_value  AVG         MIN         MAX

Could anybody tell me that what is wrong in my Impala Query.

Thanks !!!


Solution

  • I think you just want aggregation, not window functions:

    SELECT cast(cast(unix_timestamp(cast(ts/1000 as TIMESTAMP))/10 as bigint)*10 as TIMESTAMP),
           AVG(val2) as val2,
           MIN(val3) as val3,
           MAX(val4) as val4
    FROM Sensor_Data.Table
    WHERE unit = 'Unit1' AND
          CAST(ts/1000 as TIMESTAMP) BETWEEN '2020-11-29 22:30:00' and '2020-12-01 01:51:00'
    GROUP BY cast(cast(unix_timestamp(cast(ts/1000 as TIMESTAMP))/10 as bigint)*10 as TIMESTAMP)