Search code examples
sqloracle-databaseplsqloracle12cgaps-and-islands

Efficiently group rows into episodes without PL/SQL


I have a dataset with one record per day per entity. Each day will only ever have one status but a status can appear over a range of days or only on a single day

What I want to do is group these into episodes. So whenever the ID changes, the STATUS changes OR the next record is more than a day apart I want to consider it a new episode.

I have played around with analytic functions but while I can group by status I then loose track of each set of days (though potentially I could do it in one pass per status)

I also could do this in PL/SQL but it's incredibly slow in my testing. I'm hoping there is a way to perform this as a query or at least partially pre-process as a query so the pl/sql loop is faster.

ID                      DAY         STATUS  Comment
E0000000000054245349    27-Feb-16   24      Start
E0000000000054245349    28-Feb-16   24  
E0000000000054245349    29-Feb-16   24  
E0000000000054245349    1-Mar-16    24  
E0000000000054245349    3-Mar-16    21      Gap & new status
E0000000000054245349    4-Mar-16    21  
            continuing daily
E0000000000054245349    12-Mar-16   21  
E0000000000054245349    13-Mar-16   21  
E0000000000054245349    14-Mar-16   21  
E0000000000054245349    15-Mar-16   40      No gap, but new status
E0000000000054245349    16-Mar-16   40  
E0000000000054245349    18-Mar-16   40      Gap, no new status
E0000000000054245349    19-Mar-16   40      
E0000000000054245349    1-Jan-17    21      Gap & new status
E0000000000054245349    2-Jan-17    21
E0000000000054245349    3-Jan-17    21      
E0000000000054245349    5-Jan-17    25      Gap, status and single day      

My ideal dataset would like something like this. Bonus points if it contains the previous/next status for records that are 1 day before/after but I can always get them with a subsequent query if need be

ID                      START       END         STATUS
E0000000000054245349    27-Feb-16   1-Mar-16    24
E0000000000054245349    3-Mar-16    14-Mar-16   21
E0000000000054245349    15-Mar-16   16-Mar-16   40
E0000000000054245349    18-Mar-16   19-Mar-16   40
E0000000000054245349    1-Jan-17    3-Jan-17    21
E0000000000054245349    5-Jan-17    5-Jan-17    25      

Solution

  • You can do it easily using Tabibitosan Method:

    select id, min(day) mnd, max(day) mxd, status
      from (
        select day - row_number() over (partition by id order by day) grp, id, day, status
          from t)
      group by id, grp, status
      order by id, grp;
    

    dbfiddle demo

    This gives desirded output. I do not catch that phrase Bonus points if it contains the previous/next status for records that are 1 day before/after. In your output episodes may have no rows one day before / after. If you want status from previous / next row simply use lag() and lead (). But if you want it only if new episode is because of status change use analytical functions conditionally:

    select id, mnd, mxd, status, 
           case mnd when lag(mxd) over (partition by id order by mxd) + 1
                    then lag(status) over (partition by id order by mxd) 
           end prev_status
      from (select id, min(day) mnd, max(day) mxd, status
              from (select day - row_number() over (partition by id order by day) grp, 
                           id, day, status
                      from t)
              group by id, grp, status)
      order by id, mnd;
    

    ... and same for lead().