Search code examples
sqlimpalaunfold

SQL - Impala - How to unfold one categorical column into many?


I have the following table :

user      category    number
1         A           8
1         B           6
2         A           1
2         C           9
3         B           5

I want to "unfold" or "dummify" the category column and fill them with the "number" column to obtain:

user      cat_A    cat_B    cat_C
1         8        6        0
2         1        0        9
3         0        5        0

Is it possible to achieve this in SQL (Impala) ?

I found this question How to create dummy variable columns for thousands of categories in Google BigQuery?

However it seems a little bit complex and I'd rather do it in Pandas.

Is there a simpler solution, knowing that I have 10 categories (A, B, C, D etc)?


Solution

  • You can try to use condition aggregate function.

    SELECT user,
            SUM(CASE WHEN category = 'A' THEN number ELSE 0 END) cat_A,  
            SUM(CASE WHEN category = 'B' THEN number ELSE 0 END) cat_B,  
            SUM(CASE WHEN category = 'C' THEN number ELSE 0 END) cat_C     
    FROM T 
    GROUP BY user