Search code examples
sqloracle-databasegroup-bywindow-functionsanalytic-functions

Group and Summarize Time Series Transactions with Start and Stop Times


I have a series of transactions. I need to summarize the total time spent working on these transactions and a count of transactions within groups that break when you move from one group to another. This means that the group can repeat if there is a set of transactions for that group for multiple time slots. I am using an oracle database and SQL to do this summarization. I'm looking for some help writing the SQL to summarize this data. I've tried many different queries using various analytical and grouping functions.

Source Data:

User ID | Area | Start Time | End Time | Transaction ID
jsmith  24  07-JUL-14 09.06.21.000000000 AM 07-JUL-14 09.14.51.000000000 AM 63665020
jsmith  24  07-JUL-14 11.06.01.000000000 AM 07-JUL-14 11.07.10.000000000 AM 63679145
jsmith  24  07-JUL-14 11.16.36.000000000 AM 07-JUL-14 11.16.51.000000000 AM 63680441
jsmith  21  07-JUL-14 11.33.05.000000000 AM 07-JUL-14 11.33.18.000000000 AM 63683133
jsmith  21  07-JUL-14 11.48.55.000000000 AM 07-JUL-14 11.51.03.000000000 AM 63685906
jsmith  21  07-JUL-14 11.55.48.000000000 AM 07-JUL-14 11.56.01.000000000 AM 63686651
jsmith  24  07-JUL-14 12.32.51.000000000 PM 07-JUL-14 12.32.55.000000000 PM 63690943
jsmith  24  07-JUL-14 12.32.55.000000000 PM 07-JUL-14 12.33.02.000000000 PM 63690956
jsmith  21  07-JUL-14 12.42.05.000000000 PM 07-JUL-14 12.42.11.000000000 PM 63692212
jsmith  21  07-JUL-14 12.44.38.000000000 PM 07-JUL-14 12.44.43.000000000 PM 63692600
jsmith  25  07-JUL-14 01.01.18.000000000 PM 07-JUL-14 01.01.52.000000000 PM 63694808
jsmith  25  07-JUL-14 01.03.43.000000000 PM 07-JUL-14 01.04.44.000000000 PM 63695353
jsmith  25  07-JUL-14 01.12.30.000000000 PM 07-JUL-14 01.14.06.000000000 PM 63697002
jsmith  24  07-JUL-14 01.14.11.000000000 PM 07-JUL-14 01.14.23.000000000 PM 63697045
jsmith  24  07-JUL-14 01.21.41.000000000 PM 07-JUL-14 01.23.46.000000000 PM 63699080

Summary Data Desired: (mock up)

User ID | Area | Start Time | End Time | Total Minutes | Total Transactions
jsmith    24          9:06am   11:16am     110              3 
jsmith    21         11:33am   11:56am      23              3 
jsmith    24    .....
jsmith    21    .....
jsmith    25    .....
jsmith    24    .....

Solution

  • To resolve such issues, you need to generate some group number for each consecutive rows. Here I first use LAG to generate a tick mark each time we start a new group. An outer query using SUM will count the number of tick marks from the first from to the current one to generate a group number:

    SELECT "Area", 
           MIN("Start Time") as "Start Time", 
           MAX("End Time") as "End Time",  
           SUM("End Time" - "Start Time")*24*60 as "Total Minutes", 
           COUNT("Transaction ID") as "#Transaction ID"
    FROM (
      SELECT SUM(clk)
             OVER (ORDER BY "Start Time"
                   ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) grp,
    --       ^^^^^^^^^^^^^
    --        Generate a group number by summing the tick marks
             V1.*
      FROM (
        SELECT CASE
                 WHEN LAG("Area", 1, NULL) OVER (ORDER BY "Start Time") = "Area"
                 THEN 0
                 ELSE 1
               END clk,
    --         ^^^^^^^^
    --         Set a tick mark ("clock") to 1 each time we change group
             T.*
        FROM T
        ) V1
      ) V2
    GROUP BY GRP, "Area"
    ORDER BY "Start Time"
    

    See http://sqlfiddle.com/#!4/93f05/2


    A little bit more difficult to grasp, but this works too:

    SELECT "Area", 
           MIN("Start Time") as "Start Time", 
           MAX("End Time") as "End Time",  
           SUM("End Time" - "Start Time")*60 as "Total Minutes", 
           COUNT("Transaction ID") as "#Transaction ID"
    FROM (
      SELECT ROWNUM-ROW_NUMBER() 
                       OVER (PARTITION BY "Area" ORDER BY "Start Time") grp,
           T.*
      FROM T
      ORDER BY "Start Time"
    ) V
    GROUP BY GRP, "Area"
    ORDER BY "Start Time"
    

    See http://sqlfiddle.com/#!4/07d43/6

    In that case, beware that you can have duplicate grp -- but only for different group. Unless the rows are contiguous. Please refer to the comments bellow for a discussion about that.