Search code examples
sqlpostgresqlgreatest-n-per-groupgreenplum

PostgreSQL - Combining multiple rows with several attributes into one row?


I have a table like this:

DATE        ID    ScoreA    ScoreB    ScoreC
20180101    001   91        92        25
20180101    002   81        82        35
20180101    003   71        52        45
20180102    001   82        15        66
20180102    002   69        67        77
...
20180131    003   88        65        73

Take a month data for example, I want to aggregate them into a report of MAX and MIN score with only one row for each ID. Just like:

ID    ScoreA       Date_A              ScoreB        Date_B            ...
001   MAX(ScoreA)  MAX(ScoreA).DATE    MAX(ScoreB)   MAX(ScoreB).DATE  ...
002   MAX(ScoreA)  MAX(ScoreA).DATE    MAX(ScoreB)   MAX(ScoreB).DATE  ...
003   MAX(ScoreA)  MAX(ScoreA).DATE    MAX(ScoreB)   MAX(ScoreB).DATE  ...

Where MAX(ScoreA).DATE means the DATE when the corresponding MAX or MIN score appeared (if the MAX score appeared on multiple dates, just pick one randomly)

Not like common seen combine-rows cases, it involves multiple columns at the same time. And since there will be MANY IDs and HUNDREDS of Scores (I mean there are ScroeA ScroreB ... ScoreZ ... Score1 Score2 ... Score100 ...), I hope to avoid using consuming operations, such as JOIN table. So any good ideas?


Solution

  • If you want to avoid joins I'd offer such a construction

    WITH cte AS (
        SELECT DATE, ID, ScoreA, ScoreB, ScoreC,
            row_number() over (partition by ID order by ScoreA desc) rnA,
            row_number() over (partition by ID order by ScoreB desc) rnB,
            row_number() over (partition by ID order by ScoreC desc) rnC,
        FROM ...
        WHERE DATE BETWEEN ... AND ...
    ), ids AS (
        SELECT DISTINCT ID FROM cte
    )
    SELECT ID, 
        (SELECT ScoreA FROM cte t2 WHERE t2.ID = t.ID AND rnA = 1) ScoreA, 
        (SELECT DATE FROM cte t2 WHERE t2.ID = t.ID AND rnA = 1) Date_A,
        (SELECT ScoreB FROM cte t2 WHERE t2.ID = t.ID AND rnB = 1) ScoreB, 
        (SELECT DATE FROM cte t2 WHERE t2.ID = t.ID AND rnB = 1) Date_B,
        (SELECT ScoreC FROM cte t2 WHERE t2.ID = t.ID AND rnC = 1) ScoreC, 
        (SELECT DATE FROM cte t2 WHERE t2.ID = t.ID AND rnC = 1) Date_C
    FROM ids t
    

    When you need date or some other attribute of max/min value it is reasonable to use row numbering instead of aggregating functions: row_number() over (...) as rn followed by condition rn = 1

    UPD

    As soon as @TaurusDang wants to have code generation, there is my solution to allow postgres to do almost all the work:

    WITH cols AS
    (
        SELECT column_name
        FROM information_schema.columns
        WHERE table_schema = 'your_schema'
          AND table_name   = 'your_table'
          AND column_name like 'Score%'
    )
    -- first part: rows for cte subquery
    SELECT ',row_number() over (partition by ID order by ' || column_name || ' desc) rn' || column_name
    FROM cols
    UNION ALL
    -- second part: rows for final query
    SELECT ',(SELECT ' || column_name || ' FROM cte t2 WHERE t2.ID = t.ID AND rn' || column_name || ' = 1) ' || column_name || ', (SELECT DATE FROM cte t2 WHERE t2.ID = t.ID AND rn' || column_name || ' = 1) Date_' || column_name
    FROM cols
    

    Just copy generated rows into the initial query: the first half to cte and the second one to the main query