Search code examples
sqldb2db2-luw

Query to find "out of sequence" rows


My table looks similar to the following:

ID_1 | ID_2 | STATUS   | STATUS_TIME | STATUS_BY
------------------------------------------------
1111 | 2222 | 'ONE'    | [timestamp] | 'USER_1'
3333 | 4444 | 'ONE'    | [timestamp] | 'USER_2'
1111 | 2222 | 'THREE'  | [timestamp] | 'USER_3'
3333 | 4444 | 'TWO'    | [timestamp] | 'USER_2'
3333 | 4444 | 'THREE'  | [timestamp] | 'USER_4'

Essentially this table tracks status changes to particular items to which ID_1 and ID_2 pertain to. ID_1 may have many rows where it is the ID, but it is connected to the same main object. ID_2 may also have many rows where it applies.

So, to correctly match two rows, you should check if both ID_1 and ID_2 are the same. The problem I am having is due to some poorly designed software we can't tweak or get rid of (for whatever reason).

Status updates don't have any particular order in general, but for example in this case a status of ONE on a particular ID_1 + ID_2 pair should be followed by a status of TWO. Some users skip this step, so I want to see who is skipping it. Here's ideally how I would like this query to work:

  • For the current day, I want to see the row which chronologically follows the row with status update 'ONE', ONLY if it's status is not 'TWO'.
  • I do not need more than one instance per ID_1 + ID_2, UNLESS the user doing the bad statuses is different.
  • This means if USER_1 violated on five entries where ID_1 is the same but ID_2 is different, I only want 1. If USER_1 and USER_2 both violate on ID_1, they need one entry each.
  • However, if USER 1 violates on two different ID_1s, both of those need a record.
  • A status update of TWO cannot be preceded by anything other than ONE, this case does not need to be considered.

In the case of the above example, I would want the following row:

ID_1 | ID_2 | STATUS   | STATUS_TIME | STATUS_BY
------------------------------------------------
1111 | 2222 | 'THREE'  | [timestamp] | 'USER_3'

I very much hope I'm being clear enough, please comment if I am missing anything. This is on IBM's DB2 version 9.7 Fix Pack 7. My current 'solution' is long-winded and not functioning properly since it lists all users on the ID_1 + ID_2 who used ONE, even if some of them used TWO and some of them didn't. Any guidance or help is fantastic.


Solution

  • You can use an analytic function LAG() (or LEAD()) to find out-of-sequence records.

    Depending on the actual data, either of these approaches should work.

    This query returns the 'THREE' records that are not preceded by 'TWO':

    select * from (
     select 
       t.*, 
       lag(status) over (partition by id_1, id_2 order by status_time) as prev_status 
     from test t
    ) t1 where status = 'THREE' and prev_status != 'TWO'
    

    An alternative returns the 'ONE' records not followed by 'TWO':

    select * from (
     select 
      t.*, 
      lead(status) over (partition by id_1, id_2 order by status_time) as next_status 
     from test t
    ) t1 where status = 'ONE' and next_status != 'TWO'
    

    If you need to look for values in the rows just before the previous (or after the next), specify offset: LAG(status, 2) (or LEAD(status, 2)).