Search code examples
sqlhiveoracle-sqldeveloperhiveqlrow-number

SQL QUALIFY equivalent HIVE query


I'm trying to create a HIVE query from an Oracle SQL query. Essentially I want to select the first record, sorted descending by UPDATED_TM, DATETIME, ID_NUM.

SELECT 
  tbl1.NUM AS ID,
  tbl1.UNIT AS UNIT,  
  tbl2.VALUE AS VALUE,
  tbl1.CONTACT AS CONTACT_NAME,
  'FILE' AS SOURCE,
  CURDATE() AS DATE
FROM
  DB1.TBL1 tbl1
  LEFT JOIN DB1.TBL2 tbl2 ON tbl1.USR_ID = tbl2.USR_ID
WHERE
  tbl1.UNIT IS NOT NULL
  AND tbl1.TYPE = 'Generic'
QUALIFY 
  ROW_NUMBER() OVER (PARTITION BY tbl1.ROW_ID ORDER BY tbl1.UPDATED_TM DESC, tbl1.DATETIME DESC, tbl1.ID_NUM DESC) = 1

And my attempt at an equivalent Hive query (but also sql-compatible):

SELECT 
  tbl1.NUM AS ID,
  tbl1.UNIT AS UNIT,  
  tbl2.VALUE AS VALUE,
  tbl1.CONTACT AS CONTACT_NAME,
  'FILE' AS SOURCE,
  CURDATE() AS DATE
FROM (
  SELECT 
    USR_ID, TYPE, NUM, UNIT, ROW_NUMBER() OVER (PARTITION BY tbl1.ROW_ID ORDER BY tbl1.UPDATED_TM DESC, tbl1.DATETIME DESC, tbl1.ID_NUM DESC) AS RNUM
  FROM 
    DB1.TBL1
  ) tbl1
  LEFT JOIN DB1.TBL2 tbl2 ON tbl1.USR_ID = tbl2.USR_ID
WHERE
  tbl1.RNUM = 1
  AND tbl1.UNIT IS NOT NULL
  AND tbl1.TYPE = 'Generic'

Does that seem correct? Is there any way I can optimize the query? The tables I'm working with are quite large and I would like to make this as efficient as possible.

Thanks.


Solution

  •     SELECT 
      tbl1.NUM AS ID,
      tbl1.UNIT AS UNIT,  
      tbl2.VALUE AS VALUE,
      tbl1.CONTACT AS CONTACT_NAME,
      'FILE' AS SOURCE,
      CURDATE() AS DATE
    FROM
    (
    SELECT 
        USR_ID, TYPE, NUM, UNIT, ROW_NUMBER() OVER (PARTITION BY tbl.ROW_ID ORDER BY tbl.UPDATED_TM DESC, tbl.DATETIME DESC, tbl.ID_NUM DESC) AS RNUM
    FROM
        (
            SELECT 
                    USR_ID,TYPE,NUM,UNIT,ROW_ID,UPDATED_TM,DATETIME,ID_NUM 
                FROM DB1.TBL1
            WHERE UNIT IS NOT NULL 
            AND TYPE = 'Generic'
        )tbl
    )tbl1
    LEFT OUTER JOIN
    DB1.TBL2 tbl2
    ON tbl1.USR_ID = tbl2.USR_ID
    WHERE tbl1.RNUM = 1;