I have a transactional data table, with a structure similar to this:
User Event TimeStamp
u1 listen_music 2017-10-18 13:28:43
u1 click_btn 2017-10-18 13:28:53
u1 logout 2017-10-18 13:29:55
u2 login 2017-10-19 13:30:20
u2 listen_music 2017-10-19 13:33:25
u2 visit_home 2017-10-19 13:34:10
u2 listen_music 2017-10-19 13:36:15
u2 listen_music 2017-10-19 13:37:35
u2 visit_profile 2017-10-19 13:40:35
I would like to know how long did user listen to music. So the output would be something like:
User No Delta
u1 1 10 sec
u2 1 45 sec
u2 2 80 sec
u2 3 180 sec
I don't want to calculate all time deltas and then filter to 'listen_music' events, as there are many other (irrelevant) events.
So actually I want to calculate time delta between a row that satisfies a condition and the next row for all users.
I'm using Impala, but other solutions are also welcomed.
You can use lead()
:
select
user,
row_number() over(partition by user order by timestamp) no,
delta
from (
select
t.*,
unixtimestamp(lead(timestamp) over(partition by user order by timestamp))
- unixtimestamp(timestamp) delta
from mytable t
) t
where event = 'listen_music'
order by user, no;
The subquery computes the difference between the current and the next record of the same user. The outer query filters on the target event and numbers the records using row_number()
.