I want to fill NULL
values in device
column for each session_id
with an associated non-NULL value. How can I achieve that?
Here is the sample data:
+------------+-------+---------+
| session_id | step | device |
+------------+-------+---------+
| 351acc | step1 | |
| 351acc | step2 | |
| 351acc | step3 | mobile |
| 351acc | step4 | mobile |
| 350bca | step1 | desktop |
| 350bca | step2 | |
| 350bca | step3 | |
| 350bca | step4 | desktop |
+------------+-------+---------+
Desired output:
+------------+-------+---------+
| session_id | step | device |
+------------+-------+---------+
| 351acc | step1 | mobile |
| 351acc | step2 | mobile |
| 351acc | step3 | mobile |
| 351acc | step4 | mobile |
| 350bca | step1 | desktop |
| 350bca | step2 | desktop |
| 350bca | step3 | desktop |
| 350bca | step4 | desktop |
+------------+-------+---------+
The window function first_value()
with the right ordering is probably cheapest:
SELECT session_id, step
, COALESCE(device
, first_value(device) OVER (PARTITION BY session_id ORDER BY device IS NULL, step)
) AS device
FROM tbl
ORDER BY session_id DESC, step;
db<>fiddle here
ORDER BY device IS NULL, step
sorts NULL
values last, so the earliest step
with a notnull value is picked. See:
If notnull devices per session_id
are always the same, you can simplify to just ORDER BY device IS NULL
. And you don't need COALESCE
.