I'm working on a project that involves combining a series of actions into sequences. The specific programming question is that I want to figure out how to use SQL (preferably mySQL) to combine multiple rows in a table when these rows match a certain sequence. One caveat is that the table is grouped by two other columns (who did it and what day they did it).
For this example, we are working with a list of actions for people's morning routines. There are two sequences of actions that we want to combine our action list into: wakeup/snoozed alarm and started day. A wakeup/snoozed alarm sequence is a (wakeup row) -> (snooze alarm) row. A started day sequence is a (wakeup row) -> (ate breakfast row) -> (brushed teeth row).
Our raw actions table looks like this:
Our desired output looks like this:
So far, I've looked into window functions and iteration using mySQL, but both of those don't seem to scale to the fact that I have multiple sequences to match on.
I feel like this might not be possible in SQL, but I thought that I'd post this here in case someone else would know how to tackle this data processing question. The end goal is to have these results stored in a view so whenever we query the actions table in the future, it queries the "processed" actions (ie actions after they've been grouped into sequences).
If I understand correctly, you can use lead()
and then some filtering logic. First, assign the new actions:
select t.*,
(case when (actionid, next_actionid) = ('wokeUp', 'snoozedAlarm')
then 'wokeup and snoozed'
when (actionid, next_actionid, next2_actionid) = ('wokeUp', 'ateBreakfast', 'brushedTeeth')
then 'started day'
else actionid
end) as action,
from (select t.*,
lead(actionId, 1) over (partition by person, day order by id) as next_actionid,
lead(actionId, 2) over (partition by person, day order by id) as next2_actionid
from t
) t;
Next, use this information for filtering:
with newactions as (
select t.*,
(case when (actionid, next_actionid) = ('wokeUp', 'snoozedAlarm')
then 'wokeup and snoozed'
when (actionid, next_actionid, next2_actionid) = ('wokeUp', 'ateBreakfast', 'brushedTeeth')
then 'started day'
else actionid
end) as action,
(case when (actionid, next_actionid) = ('wokeUp', 'snoozedAlarm')
then 2
when (actionid, next_actionid, next2_actionid) = ('wokeUp', 'ateBreakfast', 'brushedTeeth')
then 1
else 0
end) as duration
from (select t.*,
lead(actionId, 1) over (partition by person, day order by id) as next_actionid,
lead(actionId, 2) over (partition by person, day order by id) as next2_actionid
from t
) t
)
select na.*
from (select na.*,
lag(duration) over (partition by person order by id) as prev_duration,
lag(duration) over (partition by person order by id) as prev2_duration
from newactions na
) na
where not (prev_duration >= 1 or
prev2_duration >= 2
)