I was wondering if it's possible to use SQL (preferably snowflake) to select up to N records given certain criteria. To illustrate: Lets say I have a table with 1 million records, containing full names and phone numbers.
There's no limits on the amount of phone numbers that can be assigned to X person, but I only want to select up to 10 numbers per person, even if the person has more than 10.
Notice I don't want to select just 10 records, I want the query to return every name in the table, I only want to ignore extra phone numbers when the person already has 10 of them.
Can this be done?
You can use window functions to solve this greatest-n-per-group problem:
select t.*
from (
select
t.*,
row_number() over(partition by name order by phone_number) rn
from mytable t
) t
where rn <= 10
Note that you need an ordering column to define what "top 10" actually means. I assumed phone_number
, but you can change that to whatever suits your use case best.
Better yet: as commented by waldente, snowflake has the qualify
syntax, which removes the need for a subquery:
select t.*
from mytable t
qualify row_number() over(partition by name order by phone_number) <= 10