Search code examples
sqlansi-sqlsnowflake-cloud-data-platform

Limit column value repeats to top 2


So I have this query:

SELECT 
    Search.USER_ID,
    Search.SEARCH_TERM,
    COUNT(*) AS Search.count
FROM Search

GROUP BY 1,2
ORDER BY 3 DESC

Which returns a response that looks like this:

USER_ID   SEARCH_TERM    count
bob       dog            50
bob       cat            45
sally     cat            38
john      mouse          30
sally     turtle         10
sally     lion           5
john      zebra          3
john      leopard        1

And my question is: How would I change the query, so that it only returns the top 2 most-searched-for-terms for any given user? So in the example above, the last row for Sally would be dropped, and the last row for John would also be dropped, leaving a total of 6 rows; 2 for each user, like so:

USER_ID   SEARCH_TERM    count
bob       dog            50
bob       cat            45
sally     cat            38
john      mouse          30
sally     turtle         10
john      zebra          3

Solution

  • In SQL Server, you can put the original query into a CTE, add the ROW_NUMBER() function. Then in the new main query, just add a WHERE clause to limit by the row number. Your query would look something like this:

    ;WITH OriginalQuery AS
    (
        SELECT 
            s.[User_id]
            ,s.Search_Term
            ,COUNT(*) AS 'count'
            ,ROW_NUMBER() OVER (PARTITION BY s.[USER_ID] ORDER BY COUNT(*) DESC) AS rn
        FROM Search s
        GROUP BY s.[User_id], s.Search_Term
    )
    SELECT oq.User_id
          ,oq.Search_Term
          ,oq.count
    FROM OriginalQuery oq
    WHERE rn <= 2
    ORDER BY oq.count DESC 
    

    EDIT: I specified SQL Server as the dbms I used here, but the above should be ANSI-compliant and work in Snowflake.