Search code examples
sqloraclefunctionanalytical

Oracle SQL: Generating a group id in a result set by country within a time boundary


I have the following result set ordered by ID, REGISTRATION_TS, COUNTRY

ID    CO REGISTRATION_TS    
----- -- -------------------
56053 CH 05/07/2022 20:57:47
56053 CH 05/07/2022 23:26:05
56053 CH 06/07/2022 03:40:18
56053 CH 06/07/2022 03:42:58
56053 DE 06/07/2022 07:50:21
56053 DE 12/07/2022 05:05:14
56053 DE 13/07/2022 12:43:06
56053 CH 26/07/2022 22:52:20
56053 CH 27/07/2022 04:05:14
56053 DE 27/07/2022 08:47:55
56053 DE 27/07/2022 15:34:32
86EBD SI 29/07/2022 18:05:11
86EBD SI 29/07/2022 18:13:21
86EBD AT 30/07/2022 07:35:15
86EBD DE 30/07/2022 07:35:15
86EBD AT 30/07/2022 07:38:06
86EBD AT 30/07/2022 07:38:06
86EBD AT 30/07/2022 07:46:16
86EBD AT 30/07/2022 07:46:16
86EBD SK 30/07/2022 13:14:45

And I would like to have an additional column grouping by country within a time period for an ID. In my case, for ID 56053, country CH between 05/07/2022 20:57:47 and 06/07/2022 03:42:58 should have a Group = 1, the next three rows for DE, should have Group = 2, the next two rows CH, Group should be 3. Once a new ID starts, the ID should be reset to 1.

The new result set should be:

ID    CO REGISTRATION_TS     GROUP
----- -- ------------------- -----
56053 CH 05/07/2022 20:57:47 1
56053 CH 05/07/2022 23:26:05 1
56053 CH 06/07/2022 03:40:18 1
56053 CH 06/07/2022 03:42:58 1
56053 DE 06/07/2022 07:50:21 2
56053 DE 12/07/2022 05:05:14 2
56053 DE 13/07/2022 12:43:06 2
56053 CH 26/07/2022 22:52:20 3
56053 CH 27/07/2022 04:05:14 3
56053 DE 27/07/2022 08:47:55 4
56053 DE 27/07/2022 15:34:32 4
86EBD SI 29/07/2022 18:05:11 1
86EBD SI 29/07/2022 18:13:21 1
86EBD AT 30/07/2022 07:35:15 2
86EBD DE 30/07/2022 07:35:15 3
86EBD AT 30/07/2022 07:38:06 4
86EBD AT 30/07/2022 07:38:06 4
86EBD AT 30/07/2022 07:46:16 4
86EBD AT 30/07/2022 07:46:16 4
86EBD SK 30/07/2022 13:14:45 5

I've been trying with analytical functions but I haven't hit the solution yet. My next option would be to code a PL/SQL block but I am sure this could be solved with pure SQL.

Oracle 19.0


Solution

  • From Oracle 12, you can efficiently solve row-by-row pattern matching problems with MATCH_RECOGNIZE:

    SELECT *
    FROM   table_name
    MATCH_RECOGNIZE (
      PARTITION BY id
      ORDER BY registration_ts
      MEASURES
        MATCH_NUMBER() AS mno
      ALL ROWS PER MATCH
      PATTERN (same_country+)
      DEFINE same_country AS FIRST(country) = country
    )
    

    Which, for the sample data:

    CREATE TABLE table_name (ID, country, REGISTRATION_TS) AS
    SELECT '56053', 'CH', DATE '2022-07-05' + INTERVAL '20:57:47' HOUR TO SECOND FROM DUAL UNION ALL
    SELECT '56053', 'CH', DATE '2022-07-05' + INTERVAL '23:26:05' HOUR TO SECOND FROM DUAL UNION ALL
    SELECT '56053', 'CH', DATE '2022-07-06' + INTERVAL '03:40:18' HOUR TO SECOND FROM DUAL UNION ALL
    SELECT '56053', 'CH', DATE '2022-07-06' + INTERVAL '03:42:58' HOUR TO SECOND FROM DUAL UNION ALL
    SELECT '56053', 'DE', DATE '2022-07-06' + INTERVAL '07:50:21' HOUR TO SECOND FROM DUAL UNION ALL
    SELECT '56053', 'DE', DATE '2022-07-12' + INTERVAL '05:05:14' HOUR TO SECOND FROM DUAL UNION ALL
    SELECT '56053', 'DE', DATE '2022-07-13' + INTERVAL '12:43:06' HOUR TO SECOND FROM DUAL UNION ALL
    SELECT '56053', 'CH', DATE '2022-07-26' + INTERVAL '22:52:20' HOUR TO SECOND FROM DUAL UNION ALL
    SELECT '56053', 'CH', DATE '2022-07-27' + INTERVAL '04:05:14' HOUR TO SECOND FROM DUAL UNION ALL
    SELECT '56053', 'DE', DATE '2022-07-27' + INTERVAL '08:47:55' HOUR TO SECOND FROM DUAL UNION ALL
    SELECT '56053', 'DE', DATE '2022-07-27' + INTERVAL '15:34:32' HOUR TO SECOND FROM DUAL UNION ALL
    SELECT '86EBD', 'SI', DATE '2022-07-29' + INTERVAL '18:05:11' HOUR TO SECOND FROM DUAL UNION ALL
    SELECT '86EBD', 'SI', DATE '2022-07-29' + INTERVAL '18:13:21' HOUR TO SECOND FROM DUAL UNION ALL
    SELECT '86EBD', 'AT', DATE '2022-07-30' + INTERVAL '07:35:15' HOUR TO SECOND FROM DUAL UNION ALL
    SELECT '86EBD', 'DE', DATE '2022-07-30' + INTERVAL '07:35:15' HOUR TO SECOND FROM DUAL UNION ALL
    SELECT '86EBD', 'AT', DATE '2022-07-30' + INTERVAL '07:38:06' HOUR TO SECOND FROM DUAL UNION ALL
    SELECT '86EBD', 'AT', DATE '2022-07-30' + INTERVAL '07:38:06' HOUR TO SECOND FROM DUAL UNION ALL
    SELECT '86EBD', 'AT', DATE '2022-07-30' + INTERVAL '07:46:16' HOUR TO SECOND FROM DUAL UNION ALL
    SELECT '86EBD', 'AT', DATE '2022-07-30' + INTERVAL '07:46:16' HOUR TO SECOND FROM DUAL UNION ALL
    SELECT '86EBD', 'SK', DATE '2022-07-30' + INTERVAL '13:14:45' HOUR TO SECOND FROM DUAL;
    

    Outputs:

    ID REGISTRATION_TS MNO COUNTRY
    56053 2022-07-05 20:57:47 1 CH
    56053 2022-07-05 23:26:05 1 CH
    56053 2022-07-06 03:40:18 1 CH
    56053 2022-07-06 03:42:58 1 CH
    56053 2022-07-06 07:50:21 2 DE
    56053 2022-07-12 05:05:14 2 DE
    56053 2022-07-13 12:43:06 2 DE
    56053 2022-07-26 22:52:20 3 CH
    56053 2022-07-27 04:05:14 3 CH
    56053 2022-07-27 08:47:55 4 DE
    56053 2022-07-27 15:34:32 4 DE
    86EBD 2022-07-29 18:05:11 1 SI
    86EBD 2022-07-29 18:13:21 1 SI
    86EBD 2022-07-30 07:35:15 2 AT
    86EBD 2022-07-30 07:35:15 3 DE
    86EBD 2022-07-30 07:38:06 4 AT
    86EBD 2022-07-30 07:38:06 4 AT
    86EBD 2022-07-30 07:46:16 4 AT
    86EBD 2022-07-30 07:46:16 4 AT
    86EBD 2022-07-30 13:14:45 5 SK

    fiddle