I have following HQL
SELECT count(*) OVER (PARTITION BY identity.hwid, passwordused.domain ORDER BY event.time ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING) pocet,
min(event.time) OVER (PARTITION BY identity.hwid, passwordused.domain ORDER BY event.time ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING) minTime,
max(event.time) OVER (PARTITION BY identity.hwid, passwordused.domain ORDER BY event.time ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING) maxTime
FROM t21_pam6
How can I define the 3 same WINDOW clauses into one?
The documentation (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics ) shows this example
SELECT a, SUM(b) OVER w
FROM T;
WINDOW w AS (PARTITION BY c ORDER BY d ROWS UNBOUNDED PRECEDING)
But I don't think it's working. It's not possible to define WINDOW w as... is not a HQL command.
This type of optimization is something that the compiler would need to do. I don't think there is a way to ensure this programmatically.
That said, the calculation for the minimum time is totally unnecessary. Because of the order by
, it should be the time in the current row. Similarly, if you can handle null
values, then the expression can be simplified to:
SELECT count(*) OVER (PARTITION BY identity.hwid, passwordused.domain ORDER BY event.time ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING) pocet,
event.time as minTime,
lead(event.time, 2) OVER (PARTITION BY identity.hwid, passwordused.domain ORDER BY event.time) as maxTime
FROM t21_pam6;
Note that the maxtime
calculation is slightly different because it will return NULL
for the last two values matching the conditions.