Search code examples
oracle-databasegroup

Group Data into sets by date intervals in oracle


I have data like the one below and am looking to get it grouped based on the time intervals where it is repeating for col1+col2 with start and end dates. Can anyone provide some tips to get the result in Oracle?

Col1 Col2   Date_col    Quantity
I1   L1     1-Jul-24    10
I1   L1     2-Jul-24    10
I1   L1     3-Jul-24    10
---------------------------------
I1   L1     12-Jul-24   10
I1   L1     13-Jul-24   10
I1   L1     14-Jul-24   10
---------------------------------
I2   L2     5-Jul-24    26
I2   L2     6-Jul-24    26
I2   L2     7-Jul-24    26
I2   L2     8-Jul-24    26
---------------------------------
I2   L2     10-Jul-24   34
I2   L2     11-Jul-24   34
---------------------------------
I2   L2     12-Jul-24   28
I2   L2     13-Jul-24   28
I2   L2     14-Jul-24   28
---------------------------------
I2   L2     21-Jul-24   20
I2   L2     22-Jul-24   20
I2   L2     23-Jul-24   20
I2   L2     24-Jul-24   20

Input Data

Need to aggregate data grouped by col1 and col2 and get the data with start and end dates as the first and last dates of the pattern.

Desired Output

Col1   Col2   Startdate      Enddate      Quantity
I1     L1     1/Jul/2024     3/Jul/2024   10
I1     L1     12/Jul/2024    14/Jul/2024  10
I2     L2     5/Jul/2024     8/Jul/2024   26
I2     L2     10/Jul/2024    11/Jul/2024  34
I2     L2     12/Jul/2024    14/Jul/2024  28
I2     L2     21/Jul/2024    24/Jul/2024  20

Below are the DDLs to create the above data.

Create table tab1 as 
select 'I1' col1,'L1' col2,to_date('01-07-2024','DD-MM-YYYY') Date_col,10 quantity from dual 
union
select 'I1','L1',to_date('02-07-2024','DD-MM-YYYY'),10 from dual 
union
select 'I1','L1',to_date('03-07-2024','DD-MM-YYYY'),10 from dual 
union
select 'I1','L1',to_date('12-07-2024','DD-MM-YYYY'),10 from dual 
union
select 'I1','L1',to_date('13-07-2024','DD-MM-YYYY'),10 from dual 
union
select 'I1','L1',to_date('14-07-2024','DD-MM-YYYY'),10 from dual 
union
select 'I2','L2',to_date('05-07-2024','DD-MM-YYYY'),26 from dual 
union
select 'I2','L2',to_date('06-07-2024','DD-MM-YYYY'),26 from dual 
union
select 'I2','L2',to_date('07-07-2024','DD-MM-YYYY'),26 from dual 
union
select 'I2','L2',to_date('08-07-2024','DD-MM-YYYY'),26 from dual 
union
select 'I2','L2',to_date('10-07-2024','DD-MM-YYYY'),34 from dual 
union
select 'I2','L2',to_date('11-07-2024','DD-MM-YYYY'),34 from dual 
union
select 'I2','L2',to_date('12-07-2024','DD-MM-YYYY'),28 from dual 
union
select 'I2','L2',to_date('13-07-2024','DD-MM-YYYY'),28 from dual 
union
select 'I2','L2',to_date('14-07-2024','DD-MM-YYYY'),28 from dual 
union
select 'I2','L2',to_date('21-07-2024','DD-MM-YYYY'),20 from dual 
union
select 'I2','L2',to_date('22-07-2024','DD-MM-YYYY'),20 from dual 
union
select 'I2','L2',to_date('23-07-2024','DD-MM-YYYY'),20 from dual 
union
select 'I2','L2',to_date('24-07-2024','DD-MM-YYYY'),20 from dual;

Solution

  • From Oracle 12, you can use MATCH_RECOGNIZE to perform row-by-row pattern matching:

    SELECT *
    FROM   tab1
           MATCH_RECOGNIZE(
             PARTITION BY col1, col2
             ORDER BY date_col
             MEASURES
               FIRST(quantity) AS quantity,
               FIRST(date_col) AS start_date,
               LAST(date_col) AS end_date
             PATTERN (first_row consecutive*)
             DEFINE
               consecutive AS  quantity = FIRST(quantity)
                           AND date_col <= PREV(date_col) + INTERVAL '1' DAY
           )
    

    Note: You (probably) do not want to include quantity in the columns you are partitioning by and, instead, check for identical quantities in the pattern matching.

    Which, for the sample data:

    Create table tab1 (col1, col2, date_col, quantity) as 
    SELECT 'I1','L1',DATE '2024-07-01',10 FROM DUAL UNION ALL
    SELECT 'I1','L1',DATE '2024-07-02',10 FROM DUAL UNION ALL
    SELECT 'I1','L1',DATE '2024-07-03',10 FROM DUAL UNION ALL
    SELECT 'I1','L1',DATE '2024-07-12',10 FROM DUAL UNION ALL
    SELECT 'I1','L1',DATE '2024-07-13',10 FROM DUAL UNION ALL
    SELECT 'I1','L1',DATE '2024-07-14',10 FROM DUAL UNION ALL
    SELECT 'I2','L2',DATE '2024-07-05',26 FROM DUAL UNION ALL
    SELECT 'I2','L2',DATE '2024-07-06',26 FROM DUAL UNION ALL
    SELECT 'I2','L2',DATE '2024-07-07',26 FROM DUAL UNION ALL
    SELECT 'I2','L2',DATE '2024-07-08',26 FROM DUAL UNION ALL
    SELECT 'I2','L2',DATE '2024-07-10',34 FROM DUAL UNION ALL
    SELECT 'I2','L2',DATE '2024-07-11',34 FROM DUAL UNION ALL
    SELECT 'I2','L2',DATE '2024-07-12',28 FROM DUAL UNION ALL
    SELECT 'I2','L2',DATE '2024-07-13',28 FROM DUAL UNION ALL
    SELECT 'I2','L2',DATE '2024-07-14',28 FROM DUAL UNION ALL
    SELECT 'I2','L2',DATE '2024-07-21',20 FROM DUAL UNION ALL
    SELECT 'I2','L2',DATE '2024-07-22',20 FROM DUAL UNION ALL
    SELECT 'I2','L2',DATE '2024-07-23',20 FROM DUAL UNION ALL
    SELECT 'I2','L2',DATE '2024-07-24',20 FROM DUAL;
    

    Outputs:

    COL1 COL2 QUANTITY START_DATE END_DATE
    I1 L1 10 2024-07-01 00:00:00 2024-07-03 00:00:00
    I1 L1 10 2024-07-12 00:00:00 2024-07-14 00:00:00
    I2 L2 26 2024-07-05 00:00:00 2024-07-08 00:00:00
    I2 L2 34 2024-07-10 00:00:00 2024-07-11 00:00:00
    I2 L2 28 2024-07-12 00:00:00 2024-07-14 00:00:00
    I2 L2 20 2024-07-21 00:00:00 2024-07-24 00:00:00

    fiddle


    In earlier versions you can use:

    SELECT col1,
           col2,
           MIN(date_col) AS start_date,
           MAX(date_col) AS end_date,
           MAX(quantity) AS quantity
    FROM   (
      SELECT col1,
             col2,
             date_col,
             quantity,
             SUM(has_changed_group) OVER (PARTITION BY col1, col2 ORDER BY date_col)
               AS grp
      FROM   (
        SELECT col1,
               col2,
               date_col,
               quantity,
               CASE
               WHEN quantity = LAG(quantity) OVER (PARTITION BY col1, col2 ORDER BY date_col)
               AND  date_col <= LAG(date_col) OVER (PARTITION BY col1, col2 ORDER BY date_col)
                                + INTERVAL '1' DAY
               THEN 0
               ELSE 1
               END AS has_changed_group
        FROM   tab1 t
      )
    )
    GROUP BY col1, col2, grp
    

    fiddle