I have the following transaction
table:
customer_id, category, product_id, score
I'm grouping by customer_id
and category
to create a list of product_id-score
map pairs:
SELECT
s.customer_id,
s.category,
collect_list(s.pair)
FROM
(
SELECT
customer_id,
category,
map(product_id, score) AS pair
FROM
transaction
WHERE
score > {score_threshold}
) s
GROUP BY
s.customer_id,
s.category
Now I want to take this a step further. For each group, I'm looking to retain just the top n
pairs, sorted by score
(descending order).I tried the OVER (PARTITION BY...ORDER BY)
by I'm running into problems.
Note: the transaction
table is partitioned by category
Thanks
Try this:
SELECT
s.customer_id,
s.category,
collect_list(s.pair)
FROM
(
SELECT
ROW_NUMBER() OVER (PARTITION BY customer_id, category ORDER BY score desc) as RowId
customer_id,
category,
map(product_id, score) AS pair
FROM
transaction
WHERE
score > {score_threshold}
) s
where s.RowId < n
GROUP BY
s.customer_id,
s.category