Search code examples
sqlhivewindow-functions

Converting Code from Teradata to HIVE Rank Order


I need help with converting the below query from syntax appropriate for teradata to HIVE.

I've tried a copy and past for the subquery but I'm not able to get the qualify clause to work.

CREATE MULTISET VOLATILE TABLE Month_Shifts AS (
    SELECT "Month"
        , Emp_ID
        , Emp_NM
        , MAX(ending_team) OVER (PARTITION BY Emp_ID ORDER BY "Month" ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) AS Starting_team
        , ending_team
    FROM
        (
            SELECT "Month"
                , Emp_id
                , current_team AS Ending_team
                , COUNT(DISTINCT call_key) AS CallVolume
            FROM data
            GROUP BY 1,2,3
            QUALIFY ROW_NUMBER() OVER (PARTITION BY "month", Emp_ID, Emp_NM ORDER BY CallVolume DESC) = 1
        ) a
    ) WITH DATA NO PRIMARY INDEX ON COMMIT PRESERVE ROWS;

It should be able to run without issue. Currently seeing this error message: FAILED: ParseException line 1:260 missing EOF at 'QUALIFY' near '4'


Solution

  • In Hive, you can just move the condition to the outer query:

    SELECT "Month", Emp_ID, Emp_NM,
           LAG(ending_team) OVER (PARTITION BY Emp_ID ORDER BY "Month") AS Starting_team,
           ending_team
    FROM (SELECT d."month", d.Emp_ID, d.Emp_NM,
                 current_team AS Ending_team,
                 COUNT(DISTINCT call_key) AS CallVolume,
                 ROW_NUMBER() OVER (PARTITION BY "month", Emp_ID, Emp_NM ORDER BY COUNT(DISTINCT call_key) DESC) as seqnum
            FROM data d
            GROUP d."month", d.Emp_ID, d.Emp_NM
        ) d
    WHERE seqnum = 1;
    

    Notes:

    • The QUALIFY is replaced by the WHERE in the outer query.
    • Do not use SELECT * with GROUP BY. List the columns. Regardless of database.
    • Hive supports LAG(), which is more appropriate for the outer SELECT.