I want to represent a string column as a binary 1 or 0 by pivoting the string column and making its values as header using SQL (Snowflake). It would python equivalent of pd.get_dummies
where the function transform categorical column in to one-hot encoded columns. How do I do that?
Sample Data:
id | value |
---|---|
A | 'G802' |
A | 'R620' |
A | '' |
B | 'J209' |
B | 'B009' |
C | '' |
C | 'R509' |
Expected Output:
id | G802 | R620 | J209 | B009 | R509 |
---|---|---|---|---|---|
A | 1 | 1 | 0 | 0 | 0 |
B | 0 | 0 | 1 | 1 | 0 |
C | 0 | 0 | 0 | 0 | 1 |
I need the query to be dynamic as I have more and random values in the value
column. The order of the column can be anything. The ''
value in value
can or cannot be part of the pivot table. I have tried a few options with the pivot
function but no luck. I very much appreciate your help. Thank you!
It is possible to achieve it with SQL and dynamic PIVOT:
SELECT *
FROM (SELECT id, VALUE, IFF(value='',0, 1) AS VALUE2 FROM t)
PIVOT (MAX(VALUE2) FOR VALUE IN (SELECT VALUE FROM t WHERE VALUE != '')
DEFAULT ON NULL (0));
For test data:
CREATE OR REPLACE TABLE t(id TEXT, VALUE TEXT) AS
SELECT 'A', 'G802' UNION ALL
SELECT 'A', 'R620' UNION ALL
SELECT 'A', '' UNION ALL
SELECT 'B', 'J209'UNION ALL
SELECT 'B', 'B009'UNION ALL
SELECT 'C', '' UNION ALL
SELECT 'C', 'R509';
Output:
How does it work:
IFF(value='',0, 1) AS VALUE2
helper column to categorize data 0 or 1PIVOT MAX(VALUE2)
- result of pivot based on helper columnVALUE IN (SELECT VALUE FROM t WHERE VALUE != '')
- dynamic pivot, skipping ''
''
then simply VALUE IN (ANY)
Using VALUE IN (ANY)
and EXCLUDE
:
SELECT * EXCLUDE "''"
FROM (SELECT id, VALUE, IFF(value='',0, 1) AS VALUE2 FROM t)
PIVOT (MAX(VALUE2) FOR VALUE IN (ANY) DEFAULT ON NULL (0));