How would you approach this via SQL? Let's take this example
| id | type | score_a | score_b | score_c | label_a | label_b | label_c |
|----|------|---------|---------|---------|---------|---------|---------|
| 1 | A | 0.9 | | | L1 | | |
| 1 | B | | 0.7 | | | L2 | |
| 1 | B | | 0.2 | | | L3 | |
| 1 | C | | | 0.2 | | | L4 |
| 1 | C | | | 0.18 | | | L5 |
| 1 | C | | | 0.12 | | | L6 |
| 2 | A | 0.6 | | | L1 | | |
| 2 | A | 0.3 | | | L2 | | |
I want to return the max score per type
in conjunction with the label_X
, Almost, like a pivot table but with these custom column names. So the outcome of the above will be like:
| id | type | score_a | label_a | score_b | label_b | score_c | label_c |
|----|------|---------|---------|---------|---------|---------|---------|
| 1 | A | 0.9 | L1 | 0.7 | L2 | 0.2 | L4 |
| 2 | A | 0.6 | L1 | NULL | NULL | NULL | NULL |
Something like this is wrong as it yields both results per type
per label
SELECT id,
MAX(score_a) as score_a,
label_a,
MAX(score_b) as score_b,
label_b as label_b,
MAX(score_c) as score_c,
label_c
FROM sample_table
GROUP BY id, label_a, label_b, label_c
Is there an easy way to do this via SQL, I'm doing it right now from BigQuery and tried also pivot table as described here but still no luck on how to flatten these into one big row with several columns
Any other ideas?
UPDATE
Expanding on what BGM mentioned about design; the source of this data is a table with the following form:
| id | type | label | score |
|----|------|-------|-------|
| 1 | A | L1 | 0.9 |
| 1 | B | L2 | 0.7 |
| 1 | B | L3 | 0.2 |
| 1 | C | L4 | 0.6 |
| 1 | C | L5 | 0.2 |
That gets converted to a flattened state as depicted at the top of this question using a query like
SELECT id,
type,
MAX(CASE WHEN type = 'A' THEN score ELSE 0 END) as score_a,
MAX(CASE WHEN type = 'B' THEN score ELSE 0 END) as score_b,
MAX(CASE WHEN type = 'C' THEN score ELSE 0 END) as score_c,
MAX(CASE WHEN model_type = 'theme' THEN label_score ELSE 0 END) as
-- labels
(CASE WHEN type = 'A' THEN label ELSE '' END) as label_a,
(CASE WHEN type = 'B' THEN label ELSE '' END) as label_b,
(CASE WHEN type = 'C' THEN label ELSE '' END) as label_c,
FROM table
GROUP id, label_a, label_b, label_c
Do you think the intermediate step is unnecessary to get to the final solution?
You can do conditional aggregation. In Big Query, arrays come handy for this:
select
id,
max(score_a) score_a,
array_agg(label_a order by score_a desc limit 1)[offset(0)] label_a,
max(score_b) score_b,
array_agg(label_b order by score_b desc limit 1)[offset(0)] label_b,
max(score_c) score_c,
array_agg(label_c order by score_c desc limit 1)[offset(0)] label_c
from mytable
group by id
Note: in terms of design, you should not have multiple columns to store the scores and labels per types; you already have a column that represents the types, so you should have just two columns for the store and type.