Here is my data set:
IDX SESSION_ID
1 Null
2 Null
3 Foo
4 Foo
5 Foo
6 Null
7 Bar
8 Bar
I would like to add a flag column which equals 1 if my line is a new session (I define by new session the fact that the session ID change when I order by IDX)
In that case, the output would be :
IDX SESSION_ID N_Session
1 Null 1
2 Null 0
3 Foo 1
4 Foo 0
5 Foo 0
6 Null 1
7 Bar 1
8 Bar 0
How can I do that using impala sql ? (ANSI SQL should be fine too I guess)
Try this query:
SELECT t1.IDX,
t1.SESSION_ID,
CASE WHEN t1.IDX = 1 OR
t1.SESSION_ID IS NULL AND t2.SESSION_ID IS NOT NULL OR
t1.SESSION_ID IS NOT NULL AND t2.SESSION_ID IS NULL OR
COALESCE(t1.SESSION_ID, 'a') <> COALESCE(t2.SESSION_ID, 'a')
THEN 1
ELSE 0 END AS N_Session
FROM yourTable t1
LEFT JOIN yourTable t2
ON t1.IDX = t2.IDX + 1
Here is a table showing what the temporary result from the join should look like:
IDX SESSION_ID IDX2 SESSION_ID2 N_Session
1 Null NULL NULL 1 <-- first row, 1
2 Null 1 NULL 0 <-- session values agree, 0
3 Foo 2 NULL 1 <-- values different, 1
4 Foo 3 Foo 0
5 Foo 4 Foo 0
6 Null 5 Foo 1 <-- values different, 1
7 Bar 6 NULL 1
8 Bar 7 Bar 0
It should be clear that we want to mark N_Session
with 1 in one of the following two cases:
IDX
value of 1)The verbosity of my query arises from having to handle NULL
values. If I read your logic correctly, two NULL
values when compared should actually be treated as the same value, which may not be the case with Impala SQL using certain operators.