Search code examples
sqlgreatest-n-per-groupsnowflake-cloud-data-platformmaxby

How can I get the corresponding value of a column based on an aggregate of another column?


I am migrating some PostgreSQL queries to Snowflake and I am struggling to replicate the following query:

WITH test_table(col1, col2, col3, col4) AS (
    VALUES 
        (1, 1, 7, 5),
        (1, 1, 6, 4),
        (1, 2, 1, 4),
        (1, 2, 2, 5),
        (1, 2, 3, 6)
)

SELECT
    col1,
    col2,
    (max(ARRAY[col3, col4]))[2]
FROM test_table
GROUP BY 1, 2

This query manages to get the value of col4 that corresponds to the maximum value of col3 grouped by col1, col2. However, I do not find any elegant solution to replicate this on Snowflake.


Solution

  • You can aggregate an array by any order. So you can group col3 ordered by col4:

    with test_table(col1, col2, col3, col4) 
    as (
        select * from (values 
            (1, 1, 20, 1), -- added 
            (1, 1, 7, 5),
            (1, 1, 6, 4),
            (1, 2, 1, 4),
            (1, 2, 2, 5),
            (1, 2, 3, 6))
    )
    
    SELECT
        col1,
        col2,
        array_agg(col4) within group(order by col3 desc) c4_by_max_c3
    FROM test_table
    GROUP BY 1, 2
    

    enter image description here

    Instead of getting this array, you can just pick the top number with [0]:

    SELECT
        col1,
        col2,
        array_agg(col4) within group(order by col3 desc)[0] c4_by_max_c3
    FROM test_table
    GROUP BY 1, 2
    

    enter image description here

    Hopefully Snowflake will also support max_by in the future, but it's not yet implemented. Meanwhile this will work.


    2023 Update: MAX_BY() is supported

    https://docs.snowflake.com/en/sql-reference/functions/min_by.html