Search code examples
sqlimpala

In SQL (Impala) how to calculate time delta between rows that satisfies condition and next one?


I have a transactional data table, with a structure similar to this:

User          Event                    TimeStamp
 u1         listen_music      2017-10-18 13:28:43
 u1         click_btn         2017-10-18 13:28:53
 u1         logout            2017-10-18 13:29:55
 u2         login             2017-10-19 13:30:20
 u2         listen_music      2017-10-19 13:33:25
 u2         visit_home        2017-10-19 13:34:10
 u2         listen_music      2017-10-19 13:36:15
 u2         listen_music      2017-10-19 13:37:35
 u2         visit_profile     2017-10-19 13:40:35

I would like to know how long did user listen to music. So the output would be something like:

User        No    Delta
 u1         1      10 sec
 u2         1      45 sec
 u2         2      80 sec
 u2         3      180 sec

I don't want to calculate all time deltas and then filter to 'listen_music' events, as there are many other (irrelevant) events.

So actually I want to calculate time delta between a row that satisfies a condition and the next row for all users.

I'm using Impala, but other solutions are also welcomed.


Solution

  • You can use lead():

    select 
        user, 
        row_number() over(partition by user order by timestamp) no,
        delta
    from (
        select 
            t.*,
            unixtimestamp(lead(timestamp) over(partition by user order by timestamp))
                - unixtimestamp(timestamp) delta
        from mytable t
    ) t
    where event = 'listen_music'
    order by user, no;
    

    The subquery computes the difference between the current and the next record of the same user. The outer query filters on the target event and numbers the records using row_number().