Search code examples
sqlimpala

How to select userIds that have same siteId throughout the 5 hrs of time period?


I have the following table,

userId siteId logged_time
100001 7890 31-01-2021 11:55
100002 7878 31-01-2021 12:27
100001 7890 01-02-2021 01:05
100004 7878 01-02-2021 11:55
100002 7848 01-02-2021 11:30
100001 7890 02-02-2021 13:34

What I need to do is, select all users where they have logged on to the same site for 5 hrs of first logging. My current approach is

  • get each user's first logged_time and add 5 hours to it.
  • grab all the results of each user that belong to 5 hours interval.
  • get the distinct siteId count for each user
  • If distinct siteId count for each user equals 1 then grab those users else reject that user

I think my approach is quite good. But I can't figure out how to do that using impala and SQL. Any help would be highly appreciated! Thanks.


Solution

  • SELECT *
    FROM src t1
    WHERE NOT EXISTS ( SELECT NULL
                       FROM src t2
                       WHERE t1.userId = t2.userId
                         AND t1.siteId != t2.siteId
                         AND t2.logged_time BETWEEN logged_time 
                                                AND logged_time + INTERVAL 5 HOUR )