I have a table like this:
DATE ID ScoreA ScoreB ScoreC
20180101 001 91 92 25
20180101 002 81 82 35
20180101 003 71 52 45
20180102 001 82 15 66
20180102 002 69 67 77
...
20180131 003 88 65 73
Take a month data for example, I want to aggregate them into a report of MAX and MIN score with only one row for each ID
. Just like:
ID ScoreA Date_A ScoreB Date_B ...
001 MAX(ScoreA) MAX(ScoreA).DATE MAX(ScoreB) MAX(ScoreB).DATE ...
002 MAX(ScoreA) MAX(ScoreA).DATE MAX(ScoreB) MAX(ScoreB).DATE ...
003 MAX(ScoreA) MAX(ScoreA).DATE MAX(ScoreB) MAX(ScoreB).DATE ...
Where MAX(ScoreA).DATE
means the DATE
when the corresponding MAX or MIN score appeared (if the MAX score appeared on multiple dates, just pick one randomly)
Not like common seen combine-rows cases, it involves multiple columns at the same time. And since there will be MANY ID
s and HUNDREDS of Score
s (I mean there are ScroeA
ScroreB
... ScoreZ
... Score1
Score2
... Score100
...), I hope to avoid using consuming operations, such as JOIN
table. So any good ideas?
If you want to avoid joins I'd offer such a construction
WITH cte AS (
SELECT DATE, ID, ScoreA, ScoreB, ScoreC,
row_number() over (partition by ID order by ScoreA desc) rnA,
row_number() over (partition by ID order by ScoreB desc) rnB,
row_number() over (partition by ID order by ScoreC desc) rnC,
FROM ...
WHERE DATE BETWEEN ... AND ...
), ids AS (
SELECT DISTINCT ID FROM cte
)
SELECT ID,
(SELECT ScoreA FROM cte t2 WHERE t2.ID = t.ID AND rnA = 1) ScoreA,
(SELECT DATE FROM cte t2 WHERE t2.ID = t.ID AND rnA = 1) Date_A,
(SELECT ScoreB FROM cte t2 WHERE t2.ID = t.ID AND rnB = 1) ScoreB,
(SELECT DATE FROM cte t2 WHERE t2.ID = t.ID AND rnB = 1) Date_B,
(SELECT ScoreC FROM cte t2 WHERE t2.ID = t.ID AND rnC = 1) ScoreC,
(SELECT DATE FROM cte t2 WHERE t2.ID = t.ID AND rnC = 1) Date_C
FROM ids t
When you need date or some other attribute of max/min value it is reasonable to use row numbering instead of aggregating functions: row_number() over (...) as rn
followed by condition rn = 1
UPD
As soon as @TaurusDang wants to have code generation, there is my solution to allow postgres to do almost all the work:
WITH cols AS
(
SELECT column_name
FROM information_schema.columns
WHERE table_schema = 'your_schema'
AND table_name = 'your_table'
AND column_name like 'Score%'
)
-- first part: rows for cte subquery
SELECT ',row_number() over (partition by ID order by ' || column_name || ' desc) rn' || column_name
FROM cols
UNION ALL
-- second part: rows for final query
SELECT ',(SELECT ' || column_name || ' FROM cte t2 WHERE t2.ID = t.ID AND rn' || column_name || ' = 1) ' || column_name || ', (SELECT DATE FROM cte t2 WHERE t2.ID = t.ID AND rn' || column_name || ' = 1) Date_' || column_name
FROM cols
Just copy generated rows into the initial query: the first half to cte and the second one to the main query