Search code examples
sqldatabasesnowflake-cloud-data-platformrow-numberwith-statement

How to find the number of events for the first 24 hours for each user id


I'm working on snowflake to solve a problem. I wanted to find the number of events for the first 24 hours for each user id.

This is a snippet of the database table I'm working on. I modified the table and used a date format without the time for simplification purposes.

user_id client_event_time
1 2022-07-28
1 2022-07-29
1 2022-08-21
2 2022-07-29
2 2022-07-30
2 2022-08-03

I used the following approach to find the minimum event time per user_id.

SELECT user_id, client_event_time,
       ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY client_event_time) row_number,
       MIN(client_event_time) OVER (PARTITION BY user_id) MinEventTime
FROM Data
ORDER BY user_id, client_event_time;
user_id client_event_time row_number MinEventTime
1 2022-07-28 1 2022-07-28
1 2022-07-29 2 2022-07-28
1 2022-08-21 3 2022-07-28
2 2022-07-29 1 2022-07-29
2 2022-07-30 2 2022-07-29
2 2022-08-03 3 2022-07-29

Then I tried to find the difference between the minimum event time and client_event_time, and if the difference is less than or equal to 24, I counted the client_event_time.

with NewTable as (
        (SELECT user_id,client_event_time, event_type,
        row_number() over (partition by user_id order by CLIENT_EVENT_TIME) row_number,
        MIN(client_event_time) OVER (PARTITION BY user_id) MinEventTime
        FROM Data
        ORDER BY user_id, client_event_time))
    
SELECT user_id,  
        COUNT(case when timestampdiff(hh, client_event_time, MinEventTime) <= 24  then 1 else 0 end) AS duration
FROM    NEWTABLE
GROUP BY user_id

I got the following result:

user_id duration
1 3
2 3

I wanted to find the following result:

user_id duration
1 2
2 2

Could you please help me solve this problem? Thanks!


Solution

  • Easier done with a qualify

    with cte as
    (select *
     from mytable
     qualify event_time<=min(event_time) over (partition by user_id) + interval '24 hours')
    
    select user_id, count(*) as counts
    from cte
    group by user_id