I have a dataset like
user_id | value
1111 NULL
1111 active
2222 active
I want to group by and get the first available value for each user so I do
select
user_id,
COLLECT_LIST(value) [0]
from table;
which doesn't work as for 11111
the value NULL
is shown.
I try
select
user_id,
COALESCE(value)
from table;
But that returns an array not a string. Is there a function in spark SQL which can help me with this
SELECT
user_id,
first(value, true) AS first_value
FROM
table
GROUP BY
user_id;
The first(value, true)
function takes two parameters: the column to analyze (value
) and a boolean indicating whether to ignore null values (true
). This should work as intended, returning the first available non-null entry for each user. If all entries for a user are null
, it returns null
.