Search code examples
sqlgroup-byapache-spark-sqlhive

Get error when using CASE statement with GROUP BY in Hive


I have data like

user_id status name
11111 active John
11111 rejected Jai
22222 active Prince
33333 rejected King

I want output as

user_id active_name rejected_name
11111 John Jai
22222 Prince null
33333 null King

i.e. aggregate on user_id and get names based on status

I do

SELECT
    user_id,
    CASE WHEN status = 'active' THEN name ELSE NULL END AS active_name,
    CASE WHEN status = 'rejected' THEN name ELSE NULL END AS rejected_name
FROM table
GROUP BY user_id

But this throws an error. Is there a better way to do it?


Solution

  • You need aggregation functions for every column that is not in the GROUP BY

    For you MAX is the right choice, but of course there many others for man occations

    SELECT
        user_id,
        MAX(CASE WHEN status = 'active' THEN name ELSE NULL END) AS active_name,
        MAX(CASE WHEN status = 'rejected' THEN name ELSE NULL END) AS rejected_name
    FROM table
    GROUP BY user_id