I have a table that looks like this. And I want to get the distinct count horizontally across the three columns ignoring nulls.
ID | Column1 | Column 2 | Column 3 |
---|---|---|---|
1 | A | B | C |
2 | A | A | B |
3 | A | A |
The desired output I'm looking for is:
ID | Column1 | Column 2 | Column 3 | unique_count |
---|---|---|---|---|
1 | A | B | C | 3 |
2 | A | A | B | 2 |
3 | A | A | 1 |
One possible option would be
WITH sample AS (
SELECT 'A' Column1, 'B' Column2, 'C' Column3 UNION ALL
SELECT 'A', 'A', 'B' UNION ALL
SELECT 'A', 'A', NULL UNION ALL
SELECT '', 'A', NULL
)
SELECT Column1, Column2, Column3, COUNT(DISTINCT NULLIF(TRIM(c), '')) unique_count
FROM (SELECT *, ROW_NUMBER() OVER () rn FROM sample) t LATERAL VIEW EXPLODE(ARRAY(Column1, Column2, Column3)) tf AS c
GROUP BY Column1, Column2, Column3, rn;
+---------+---------+---------+--------------+
| column1 | column2 | column3 | unique_count |
+---------+---------+---------+--------------+
| | A | NULL | 1 |
| A | A | NULL | 1 |
| A | A | B | 2 |
| A | B | C | 3 |
+---------+---------+---------+--------------+