Search code examples
sqlimpala

Flag specific lines in impala


Here is my data set:

IDX  SESSION_ID
1    Null
2    Null
3    Foo
4    Foo
5    Foo
6    Null
7    Bar
8    Bar

I would like to add a flag column which equals 1 if my line is a new session (I define by new session the fact that the session ID change when I order by IDX)

In that case, the output would be :

IDX  SESSION_ID  N_Session
1    Null        1
2    Null        0
3    Foo         1
4    Foo         0
5    Foo         0
6    Null        1
7    Bar         1
8    Bar         0

How can I do that using impala sql ? (ANSI SQL should be fine too I guess)


Solution

  • Try this query:

    SELECT t1.IDX,
           t1.SESSION_ID,
           CASE WHEN t1.IDX = 1 OR
                     t1.SESSION_ID IS NULL AND t2.SESSION_ID IS NOT NULL OR
                     t1.SESSION_ID IS NOT NULL AND t2.SESSION_ID IS NULL OR
                     COALESCE(t1.SESSION_ID, 'a') <> COALESCE(t2.SESSION_ID, 'a')
                THEN 1
                ELSE 0 END AS N_Session
    FROM yourTable t1
    LEFT JOIN yourTable t2
        ON t1.IDX = t2.IDX + 1
    

    Here is a table showing what the temporary result from the join should look like:

    IDX  SESSION_ID   IDX2   SESSION_ID2   N_Session
    1    Null         NULL   NULL          1            <-- first row, 1
    2    Null         1      NULL          0            <-- session values agree, 0
    3    Foo          2      NULL          1            <-- values different, 1
    4    Foo          3      Foo           0
    5    Foo          4      Foo           0
    6    Null         5      Foo           1            <-- values different, 1
    7    Bar          6      NULL          1
    8    Bar          7      Bar           0
    

    It should be clear that we want to mark N_Session with 1 in one of the following two cases:

    • the two session IDs do not agree
    • the two session IDs agree, but the row is the first row (IDX value of 1)

    The verbosity of my query arises from having to handle NULL values. If I read your logic correctly, two NULL values when compared should actually be treated as the same value, which may not be the case with Impala SQL using certain operators.