Trying to perform a dynamic limit in hive sql via the rank function.
PROBLEM: I want to use the limit from table A against table B to create the output. Example below.
TABLE A:
ID | Limit
------------
123 | 1
456 | 3
789 | 2
TABLE B:
ID | User
-------
123 | ABC
123 | DEF
123 | GHI
456 | JKL
456 | MNO
789 | PQR
789 | RST
OUTPUT:
ID | User
----------
123 | ABC
456 | JKL
456 | MNO
789 | PQR
789 | RST
Unfortunately you cannot do a dynamic limit (as far as I know) in hive sql. So I was trying to use rank. My current query looks like this:
SELECT c.id, c.users, c.rnk
FROM (
SELECT b.id, b.user, a.limit, rank() over (ORDER BY b.id DESC) as rnk
FROM a JOIN b
ON a.id = b.id
) c
WHERE rnk < c.limit;
Currently I get the error:
ParseException line 3:9 cannot recognize input near 'rank' '(' ')' in from source 0
Any ideas why? Or maybe a better approach?
Thanks!
SELECT c.id, c.users, c.rnk
FROM (
SELECT b.id, b.user, a.limit, row_number() over (PARTITION by b.id ORDER BY b.id ) as rn
FROM a JOIN b
ON a.id = b.id
) c
WHERE rn <= c.limit;
In the above query row_number()
will number rows after join, filter in the where
clause will work as limit. ORDER BY
is not necessary for simply limiting rows without any preference, replace ORDER BY
with your rule, for example order by user.