Search code examples
hivehiveqlwindow-functions

Semantic exception error in HIVE while using last_value window function


I have a table with the following data:

dt  device  id  count
2018-10-05  computer    7541185957382   6
2018-10-20  computer    7541185957382   3
2018-10-14  computer    7553187775734   6
2018-10-17  computer    7553187775734   10
2018-10-21  computer    7553187775734   2
2018-10-22  computer    7549187067178   5
2018-10-20  computer    7553187757256   3
2018-10-11  computer    7549187067178   10

I want to get the last and first dt for each id. Hence, I used the window functions first_value and last_value as follows:

select id,last_value(dt) over (partition by id order by dt) last_dt
from table
order by id
;

But I am getting this error:

FAILED: SemanticException Failed to breakup Windowing invocations into Groups. At least 1 group must only depend on input columns. Also check for circular dependencies.
Underlying error: Primitve type DATE not supported in Value Boundary expression

I am not able to diagnose the problem, and I would appreciate any help.


Solution

  • If you add rows between clause in your query, then your query will work fine.

    hive> select id,last_value(dt) over (partition by id order by dt 
          rows between unbounded preceding and unbounded following) last_dt 
          from table order by id;
    

    Result:

    +----------------+-------------+--+
    |       id       |   last_dt   |
    +----------------+-------------+--+
    | 7541185957382  | 2018-10-20  |
    | 7541185957382  | 2018-10-20  |
    | 7549187067178  | 2018-10-22  |
    | 7549187067178  | 2018-10-22  |
    | 7553187757256  | 2018-10-20  |
    | 7553187775734  | 2018-10-21  |
    | 7553187775734  | 2018-10-21  |
    | 7553187775734  | 2018-10-21  |
    +----------------+-------------+--+
    

    There is Jira regards to primitive type support and got fixed in Hive.2.1.0

    UPDATE:

    For distinct records you can use ROW_NUMBER window function and filter out only the first row from the result set.

    hive> select id,last_dt from 
              (select id,last_value(dt) over (partition by id order by dt 
                  rows between unbounded preceding and unbounded following) last_dt,
                  ROW_NUMBER() over (partition by id order by dt)rn 
                  from so )t 
               where t.rn=1;
    

    Result:

    +----------------+-------------+--+
    |       id       |     dt      |
    +----------------+-------------+--+
    | 7541185957382  | 2018-10-20  |
    | 7553187757256  | 2018-10-20  |
    | 7553187775734  | 2018-10-21  |
    | 7549187067178  | 2018-10-22  |
    +----------------+-------------+--+