I have a dataset with one record per day per entity. Each day will only ever have one status but a status can appear over a range of days or only on a single day
What I want to do is group these into episodes. So whenever the ID changes, the STATUS changes OR the next record is more than a day apart I want to consider it a new episode.
I have played around with analytic functions but while I can group by status I then loose track of each set of days (though potentially I could do it in one pass per status)
I also could do this in PL/SQL but it's incredibly slow in my testing. I'm hoping there is a way to perform this as a query or at least partially pre-process as a query so the pl/sql loop is faster.
ID DAY STATUS Comment
E0000000000054245349 27-Feb-16 24 Start
E0000000000054245349 28-Feb-16 24
E0000000000054245349 29-Feb-16 24
E0000000000054245349 1-Mar-16 24
E0000000000054245349 3-Mar-16 21 Gap & new status
E0000000000054245349 4-Mar-16 21
continuing daily
E0000000000054245349 12-Mar-16 21
E0000000000054245349 13-Mar-16 21
E0000000000054245349 14-Mar-16 21
E0000000000054245349 15-Mar-16 40 No gap, but new status
E0000000000054245349 16-Mar-16 40
E0000000000054245349 18-Mar-16 40 Gap, no new status
E0000000000054245349 19-Mar-16 40
E0000000000054245349 1-Jan-17 21 Gap & new status
E0000000000054245349 2-Jan-17 21
E0000000000054245349 3-Jan-17 21
E0000000000054245349 5-Jan-17 25 Gap, status and single day
My ideal dataset would like something like this. Bonus points if it contains the previous/next status for records that are 1 day before/after but I can always get them with a subsequent query if need be
ID START END STATUS
E0000000000054245349 27-Feb-16 1-Mar-16 24
E0000000000054245349 3-Mar-16 14-Mar-16 21
E0000000000054245349 15-Mar-16 16-Mar-16 40
E0000000000054245349 18-Mar-16 19-Mar-16 40
E0000000000054245349 1-Jan-17 3-Jan-17 21
E0000000000054245349 5-Jan-17 5-Jan-17 25
You can do it easily using Tabibitosan Method:
select id, min(day) mnd, max(day) mxd, status
from (
select day - row_number() over (partition by id order by day) grp, id, day, status
from t)
group by id, grp, status
order by id, grp;
This gives desirded output. I do not catch that phrase Bonus points if it contains the previous/next status for records that are 1 day before/after. In your output episodes may have no rows one day before / after. If you want status from previous / next row simply use lag()
and lead ()
. But if you want it only if new episode is because of status change use analytical functions conditionally:
select id, mnd, mxd, status,
case mnd when lag(mxd) over (partition by id order by mxd) + 1
then lag(status) over (partition by id order by mxd)
end prev_status
from (select id, min(day) mnd, max(day) mxd, status
from (select day - row_number() over (partition by id order by day) grp,
id, day, status
from t)
group by id, grp, status)
order by id, mnd;
... and same for lead()
.