Search code examples
sqldatabaseoracleoracle11g

Merge Datetime Ranges Oracle SQL or PL/SQL


I have been struggling to merge datetime ranges in oracle SQL or PL/SQL (Database Standard Edition 11gR2).

I am trying to merge datetime ranges so that the following data

order_id    start_date_time         end_date_time
3933        04/02/2020 08:00:00     04/02/2020 12:00:00
3933        04/02/2020 13:30:00     04/02/2020 17:00:00
3933        04/02/2020 14:00:00     04/02/2020 19:00:00
3933        05/02/2020 13:40:12     05/02/2020 14:34:48
3933        05/02/2020 14:00:00     05/02/2020 18:55:12
3933        05/02/2020 14:49:48     05/02/2020 15:04:48
3933        06/02/2020 08:00:00     06/02/2020 12:00:00
3933        06/02/2020 13:30:00     06/02/2020 17:00:00
3933        06/02/2020 14:10:12     06/02/2020 18:49:48
3933        07/02/2020 08:00:00     07/02/2020 10:30:00
3933        07/02/2020 08:00:00     07/02/2020 12:00:00
3933        07/02/2020 13:30:00     07/02/2020 17:00:00
11919       14/05/2020 09:00:00     14/05/2020 17:00:00
11919       14/05/2020 09:00:00     14/05/2020 17:00:00
11919       14/05/2020 15:00:00     14/05/2020 16:30:00
11919       15/05/2020 08:40:12     15/05/2020 16:30:00
11919       15/05/2020 09:40:12     15/05/2020 16:30:00
11919       15/05/2020 10:15:00     15/05/2020 12:15:00
11919       15/05/2020 13:19:48     15/05/2020 16:00:00
11919       18/05/2020 08:49:48     18/05/2020 09:45:00
11919       18/05/2020 10:00:00     18/05/2020 17:00:00
11919       18/05/2020 10:00:00     18/05/2020 16:58:12
11919       18/05/2020 15:34:48     18/05/2020 16:10:12
11919       18/05/2020 16:30:00     18/05/2020 16:45:00
...         ...                     ...

would transform into the following result set

--after merge (this is the result I am seeking)
order_id    start_date_time         end_date_time
3933        04/02/2020 08:00:00     04/02/2020 12:00:00
3933        04/02/2020 13:30:00     04/02/2020 19:00:00
3933        05/02/2020 13:40:12     05/02/2020 18:55:12
3933        06/02/2020 08:00:00     06/02/2020 12:00:00
3933        06/02/2020 13:30:00     06/02/2020 18:49:48
3933        07/02/2020 08:00:00     07/02/2020 12:00:00
3933        07/02/2020 13:30:00     07/02/2020 17:00:00
11919       14/05/2020 09:00:00     14/05/2020 17:00:00
11919       15/05/2020 08:40:12     15/05/2020 16:30:00
11919       18/05/2020 08:49:48     18/05/2020 17:00:00
...         ...                     ...

The format of start_date_time and end_date_time is DAY/MONTH/YEAR HH24:MI:SS.

Any suggestion/solution on how to make that merge in Oracle SQL or PL/SQL?

I thought that was a trivial problem, however I was not able to find a solution on the internet yet.

Thanks in advance.


Solution

  • This is adapted from this answer which contains an explanation of the code. All that has changed is to add PARTITION BY order_id to calculate the date ranges for each order_id and then to return the ranges (rather than total the values, as per the linked answer):

    SELECT order_id,
           start_date_time,
           end_date_time
    FROM   (
      SELECT order_id,
             LAG( dt ) OVER ( PARTITION BY order_id ORDER BY dt ) AS start_date_time,
             dt AS end_date_time,
             start_end
      FROM   (
        SELECT order_id,
               dt,
               CASE SUM( value ) OVER ( PARTITION BY order_id ORDER BY dt ASC, value DESC, ROWNUM ) * value
                 WHEN 1 THEN 'start'
                 WHEN 0 THEN 'end'
               END AS start_end
        FROM   table_name
        UNPIVOT ( dt FOR value IN ( start_date_time AS 1, end_date_time AS -1 ) )
      )
      WHERE start_end IS NOT NULL
    )
    WHERE  start_end = 'end';
    

    From Oracle 12, you can use MATCH_RECONIZE to do row-by-row processing:

    SELECT *
    FROM   table_name
    MATCH_RECOGNIZE(
      PARTITION BY order_id
      ORDER     BY start_date_time
      MEASURES
        FIRST(start_date_time) AS start_date_time,
        MAX(end_date_time)     AS end_date_time
      ONE ROW PER MATCH
      PATTERN (overlapping_rows* last_row)
      DEFINE
        overlapping_rows AS NEXT(start_date_time) <= MAX(end_date_time)
    )
    

    Which, for your test data:

    CREATE TABLE table_name (
      order_id NUMBER,
      start_date_time DATE,
      end_date_time DATE
    );
    
    INSERT INTO table_name ( order_id, start_date_time, end_date_time )
    SELECT 3933, TIMESTAMP '2020-02-04 08:00:00', TIMESTAMP '2020-02-04 12:00:00' FROM DUAL UNION ALL
    SELECT 3933, TIMESTAMP '2020-02-04 13:30:00', TIMESTAMP '2020-02-04 17:00:00' FROM DUAL UNION ALL
    SELECT 3933, TIMESTAMP '2020-02-04 14:00:00', TIMESTAMP '2020-02-04 19:00:00' FROM DUAL UNION ALL
    SELECT 3933, TIMESTAMP '2020-02-05 13:40:12', TIMESTAMP '2020-02-05 14:34:48' FROM DUAL UNION ALL
    SELECT 3933, TIMESTAMP '2020-02-05 14:00:00', TIMESTAMP '2020-02-05 18:55:12' FROM DUAL UNION ALL
    SELECT 3933, TIMESTAMP '2020-02-05 14:49:48', TIMESTAMP '2020-02-05 15:04:48' FROM DUAL UNION ALL
    SELECT 3933, TIMESTAMP '2020-02-06 08:00:00', TIMESTAMP '2020-02-06 12:00:00' FROM DUAL UNION ALL
    SELECT 3933, TIMESTAMP '2020-02-06 13:30:00', TIMESTAMP '2020-02-06 17:00:00' FROM DUAL UNION ALL
    SELECT 3933, TIMESTAMP '2020-02-06 14:10:12', TIMESTAMP '2020-02-06 18:49:48' FROM DUAL UNION ALL
    SELECT 3933, TIMESTAMP '2020-02-07 08:00:00', TIMESTAMP '2020-02-07 10:30:00' FROM DUAL UNION ALL
    SELECT 3933, TIMESTAMP '2020-02-07 08:00:00', TIMESTAMP '2020-02-07 12:00:00' FROM DUAL UNION ALL
    SELECT 3933, TIMESTAMP '2020-02-07 13:30:00', TIMESTAMP '2020-02-07 17:00:00' FROM DUAL UNION ALL
    SELECT 11919, TIMESTAMP '2020-05-14 09:00:00', TIMESTAMP '2020-05-14 17:00:00' FROM DUAL UNION ALL
    SELECT 11919, TIMESTAMP '2020-05-14 09:00:00', TIMESTAMP '2020-05-14 17:00:00' FROM DUAL UNION ALL
    SELECT 11919, TIMESTAMP '2020-05-14 15:00:00', TIMESTAMP '2020-05-14 16:30:00' FROM DUAL UNION ALL
    SELECT 11919, TIMESTAMP '2020-05-15 08:40:12', TIMESTAMP '2020-05-15 16:30:00' FROM DUAL UNION ALL
    SELECT 11919, TIMESTAMP '2020-05-15 09:40:12', TIMESTAMP '2020-05-15 16:30:00' FROM DUAL UNION ALL
    SELECT 11919, TIMESTAMP '2020-05-15 10:15:00', TIMESTAMP '2020-05-15 12:15:00' FROM DUAL UNION ALL
    SELECT 11919, TIMESTAMP '2020-05-15 13:19:48', TIMESTAMP '2020-05-15 16:00:00' FROM DUAL UNION ALL
    SELECT 11919, TIMESTAMP '2020-05-18 08:49:48', TIMESTAMP '2020-05-18 09:45:00' FROM DUAL UNION ALL
    SELECT 11919, TIMESTAMP '2020-05-18 10:00:00', TIMESTAMP '2020-05-18 17:00:00' FROM DUAL UNION ALL
    SELECT 11919, TIMESTAMP '2020-05-18 10:00:00', TIMESTAMP '2020-05-18 16:58:12' FROM DUAL UNION ALL
    SELECT 11919, TIMESTAMP '2020-05-18 15:34:48', TIMESTAMP '2020-05-18 16:10:12' FROM DUAL UNION ALL
    SELECT 11919, TIMESTAMP '2020-05-18 16:30:00', TIMESTAMP '2020-05-18 16:45:00' FROM DUAL;
    

    Which both output:

    ORDER_ID | START_DATE_TIME     | END_DATE_TIME      
    -------: | :------------------ | :------------------
        3933 | 2020-02-04 08:00:00 | 2020-02-04 12:00:00
        3933 | 2020-02-04 13:30:00 | 2020-02-04 19:00:00
        3933 | 2020-02-05 13:40:12 | 2020-02-05 18:55:12
        3933 | 2020-02-06 08:00:00 | 2020-02-06 12:00:00
        3933 | 2020-02-06 13:30:00 | 2020-02-06 18:49:48
        3933 | 2020-02-07 08:00:00 | 2020-02-07 12:00:00
        3933 | 2020-02-07 13:30:00 | 2020-02-07 17:00:00
       11919 | 2020-05-14 09:00:00 | 2020-05-14 17:00:00
       11919 | 2020-05-15 08:40:12 | 2020-05-15 16:30:00
       11919 | 2020-05-18 08:49:48 | 2020-05-18 09:45:00
       11919 | 2020-05-18 10:00:00 | 2020-05-18 17:00:00
    

    db<>fiddle here