Search code examples
hiveqlranktop-n

Hive - top n records within a group


I am currently using Hive and I have a table with the fields user_id and value. I want to order the values in descending order within each user_id and then only emit the top 100 records for each user_id. This is the code I am attempting to use:

DROP TABLE IF EXISTS mytable2
CREATE TABLE mytable2 AS
SELECT * FROM 
(SELECT *, rank (user_id) as rank
FROM 
(SELECT * from mytable
DISTRIBUTE BY user_id
SORT BY user_id, value DESC)a )b
WHERE rank<101
ORDER BY rank;

However when I run this query, I get the following error:

Error while compiling statement: FAILED: SemanticException [Error 10247]:    Missing over clause for function : rank [ERROR_STATUS]

FYI - My UserIds are alpha-numeric.

Can anyone help?

Thanks in advance.

Add comment


Solution

  • As the error message says, you have error using the rank function, try to add over after rank as following:

    ....
      (SELECT *, rank (user_id) over (order by user_id) as rank
    ....
    

    for further information how to use the rank function you could refer to this documentation