Search code examples
sqloracledatetimewindow-functionsgaps-and-islands

Return first and last timestamp for each ID in sequence with possible repeated and missing values


I have a list of users, application IDs, and activity timestamps that track the users activity throughout the day. The data is structured such as it shows a line of events for every activity on every application ID: user A will go in to conduct 6 activities on application 123, then switch to application 456 to log 4 activities, return to application 123 again for one additional activity, etc.

I've tried using lead () and lag () functions but am running into issue with data structure, especially when there's a single activity line for a given appl_id. Below is an example of my data.

|User| APPL_ID | ACTIVITY_TIME
  A    123       11/20/2020 08:11:45 AM
  A    123       11/20/2020 08:11:45 AM
  A    123       11/20/2020 08:11:45 AM
  A    123       11/20/2020 08:17:13 AM
  A    123       11/20/2020 08:17:13 AM
  A    123       11/20/2020 08:30:00 AM
  A    456       11/20/2020 09:45:02 AM
  A    456       11/20/2020 09:45:02 AM
  A    456       11/20/2020 09:55:15 AM
  A    456       11/20/2020 09:59:45 AM
  A    123       11/20/2020 10:35:00 AM
  A    789       11/20/2020 10:45:15 AM
  A    789       11/20/2020 10:50:33 AM
  B    951       11/20/2020 08:15:15 AM
  B    951       11/20/2020 08:15:15 AM
  B    951       11/20/2020 08:33:37 AM
  B    012       11/20/2020 09:13:00 AM
  C    852       11/20/2020 07:45:25 AM
  C    852       11/20/2020 07:47:41 AM
  C    741       11/20/2020 08:00:22 AM
  C    852       11/20/2020 08:25:23 AM
  C    852       11/20/2020 08:25:23 AM
  C    852       11/20/2020 08:25:23 AM
  C    852       11/20/2020 08:29:46 AM

In addition to needing the first and last activity timestamps by user and appl_id, I also need to calculate the time spent on each application by the user and the idle time between applications. Notice the caveat of application 123 at 10:35 where only one activity was logged so the IN and OUT times are both equal:

|User| APPL_ID |      IN_TIME        |      OUT_TIME     |   IN_OUT_MIN  |   IDLE_MIN
  A    123       11/20/2020 08:11 AM   11/20/2020 08:30 AM      19.0            -
  A    456       11/20/2020 09:45 AM   11/20/2020 09:59 AM      14.0           75.0
  A    123       11/20/2020 10:35 AM   11/20/2020 10:35 AM       0.0           36.0
  A    789       11/20/2020 10:45 AM   11/20/2020 10:50 AM       5.0           10.0
  B    951       11/20/2020 08:15 AM   11/20/2020 08:33 AM      18.0            -
  B    012       11/20/2020 09:13 AM   11/20/2020 09:13 AM       0.0           50.0
  C    852       11/20/2020 07:45 AM   11/20/2020 07:47 AM       2.0            -
  C    741       11/20/2020 08:00 AM   11/20/2020 08:00 AM       0.0           13.0
  C    852       11/20/2020 08:25 AM   11/20/2020 08:29 AM       4.0           25.0

These are the calculations:

in_out_time = out_time - in_time
idle_min = in_time - previous out_time

If previous OUT time is missing or from earlier date, then the idle_min calculation needs to return a blank.


Solution

  • This is a gaps and islands problem. Here is one approach that uses the difference between row numbers to identify groups of "adjacent rows" (the islands). To compute the duration of each gap, we can use window functions again:

    select user_id, appl_id,
        min(activity_time) as in_time,
        max(activity_time) as out_time,
        (max(activity_time) - min(activity_time)) * 24 * 60 as in_out_min,
        (min(activity_time) - lag(max(activity_time)) over(partition by user_id order by min(activity_time))) * 24 * 60 as idle_min
    from (
        select t.*,
            row_number() over(partition by user_id order by activity_time) rn1,
            row_number() over(partition by user_id, appl_id order by activity_time) rn2
        from mytable t
    ) t
    group by user_id, appl_id, rn1 - rn2
    order by user_id, in_time
    

    Here is a demo on DB Fiddle (I rounded the durations so they are easier to read):

    USER_ID | APPL_ID | IN_TIME             | OUT_TIME            | IN_OUT_MIN | IDLE_MIN
    :------ | ------: | :------------------ | :------------------ | ---------: | -------:
    A       |     123 | 11/20/2020 08:11 AM | 11/20/2020 08:30 AM |         19 |     null
    A       |     456 | 11/20/2020 09:45 AM | 11/20/2020 09:59 AM |         14 |       75
    A       |     123 | 11/20/2020 10:35 AM | 11/20/2020 10:35 AM |          0 |       36
    A       |     789 | 11/20/2020 10:45 AM | 11/20/2020 10:50 AM |          5 |       10
    B       |     951 | 11/20/2020 08:15 AM | 11/20/2020 08:33 AM |         18 |     null
    B       |      12 | 11/20/2020 09:13 AM | 11/20/2020 09:13 AM |          0 |       40
    C       |     852 | 11/20/2020 07:45 AM | 11/20/2020 07:47 AM |          2 |     null
    C       |     741 | 11/20/2020 08:00 AM | 11/20/2020 08:00 AM |          0 |       13
    C       |     852 | 11/20/2020 08:25 AM | 11/20/2020 08:29 AM |          4 |       25