Search code examples
sqlselectgreatest-n-per-groupwindow-functionssnowflake-cloud-data-platform

SQL query to select up to N records per criteria


I was wondering if it's possible to use SQL (preferably snowflake) to select up to N records given certain criteria. To illustrate: Lets say I have a table with 1 million records, containing full names and phone numbers.

There's no limits on the amount of phone numbers that can be assigned to X person, but I only want to select up to 10 numbers per person, even if the person has more than 10.

Notice I don't want to select just 10 records, I want the query to return every name in the table, I only want to ignore extra phone numbers when the person already has 10 of them.

Can this be done?


Solution

  • You can use window functions to solve this greatest-n-per-group problem:

    select t.*
    from (
        select 
            t.*, 
            row_number() over(partition by name order by phone_number) rn
        from mytable t
    ) t
    where rn <= 10
    

    Note that you need an ordering column to define what "top 10" actually means. I assumed phone_number, but you can change that to whatever suits your use case best.

    Better yet: as commented by waldente, snowflake has the qualify syntax, which removes the need for a subquery:

    select t.*
    from mytable t
    qualify row_number() over(partition by name order by phone_number) <= 10