My table looks similar to the following:
ID_1 | ID_2 | STATUS | STATUS_TIME | STATUS_BY
------------------------------------------------
1111 | 2222 | 'ONE' | [timestamp] | 'USER_1'
3333 | 4444 | 'ONE' | [timestamp] | 'USER_2'
1111 | 2222 | 'THREE' | [timestamp] | 'USER_3'
3333 | 4444 | 'TWO' | [timestamp] | 'USER_2'
3333 | 4444 | 'THREE' | [timestamp] | 'USER_4'
Essentially this table tracks status changes to particular items to which ID_1 and ID_2 pertain to. ID_1 may have many rows where it is the ID, but it is connected to the same main object. ID_2 may also have many rows where it applies.
So, to correctly match two rows, you should check if both ID_1 and ID_2 are the same. The problem I am having is due to some poorly designed software we can't tweak or get rid of (for whatever reason).
Status updates don't have any particular order in general, but for example in this case a status of ONE on a particular ID_1 + ID_2 pair should be followed by a status of TWO. Some users skip this step, so I want to see who is skipping it. Here's ideally how I would like this query to work:
In the case of the above example, I would want the following row:
ID_1 | ID_2 | STATUS | STATUS_TIME | STATUS_BY
------------------------------------------------
1111 | 2222 | 'THREE' | [timestamp] | 'USER_3'
I very much hope I'm being clear enough, please comment if I am missing anything. This is on IBM's DB2 version 9.7 Fix Pack 7. My current 'solution' is long-winded and not functioning properly since it lists all users on the ID_1 + ID_2 who used ONE, even if some of them used TWO and some of them didn't. Any guidance or help is fantastic.
You can use an analytic function LAG()
(or LEAD()
) to find out-of-sequence records.
Depending on the actual data, either of these approaches should work.
This query returns the 'THREE' records that are not preceded by 'TWO':
select * from (
select
t.*,
lag(status) over (partition by id_1, id_2 order by status_time) as prev_status
from test t
) t1 where status = 'THREE' and prev_status != 'TWO'
An alternative returns the 'ONE' records not followed by 'TWO':
select * from (
select
t.*,
lead(status) over (partition by id_1, id_2 order by status_time) as next_status
from test t
) t1 where status = 'ONE' and next_status != 'TWO'
If you need to look for values in the rows just before the previous (or after the next), specify offset: LAG(status, 2)
(or LEAD(status, 2)
).