Search code examples
sqlsql-servert-sqlsql-server-2019

Query to find problems with missing row pairs, given a partition?


I have a table where information is being stored like so:

Work Order Employee FunctionGroup FunctionType Timestamp
WO1 Emp1 Group1 Start 7/27/23 09:00
WO1 Emp1 Group1 Stop 7/27/23 10:00
WO1 Emp1 Group1 Start 7/27/23 11:00
WO1 Emp1 Group1 Stop 7/27/23 12:00
WO2 Emp2 Group2 Start 7/27/23 13:00
WO2 Emp2 Group2 Stop 7/27/23 14:00
WO2 Emp2 Group2 Start 7/27/23 15:00
WO2 Emp2 Group2 Stop 7/27/23 16:00
WO3 Emp3 Group3 Start 7/27/23 17:00 (problem here: since the next row is also a Start, then this row should be returned)
WO3 Emp3 Group3 Start 7/27/23 18:00
WO3 Emp3 Group3 Start 7/27/23 19:00
WO3 Emp3 Group3 Stop 7/27/23 20:00
WO4 Emp4 Group4 Stop 7/27/23 17:00 (problem here: since the dataset for this partition starts with a Stop instead of a Start, then this row should be returned)
WO4 Emp4 Group4 Start 7/27/23 18:00
WO4 Emp4 Group4 Start 7/27/23 19:00
WO4 Emp4 Group4 Stop 7/27/23 20:00

For each work order, employee, and function group, an employee can insert a Start and a Stop row (this is basically how the partition is defined). They must always be in the order of Start then Stop and cannot be backwards. This data collection will all start on a specific date/time, so there is a clean point where all the data must come in beginning with a Start. They can insert these pairs as many times as the need to. I need to write a query that checks to see if there is a problem with the pairings and if so - return the first row where this problem appeared.

In the table above, the last 2 sections show a potential problem and the row that must be returned. The main challenge here is just figuring out how to determine which rows go together as a pair. One alternative way I am thinking of handling this is to simply insert a unique ID with every Start, then insert that same ID with every Stop. This might work, but for now I need a query that can show me problems for the test data I'm using.


Solution

  • So based on the expectation that the data should be in start/stop pairs, you can allocate a row number and then compare the expected FunctionType with the actual FunctionType, or rather, since you want the error to appear on the line before, the next FunctionType (using LEAD) with the expected FunctionType.

    declare @TestData table (WorkOrder varchar(3), Employee varchar(4), FunctionGroup varchar(6), FunctionType varchar(5), [Timestamp] datetime)
    
    insert into @TestData (WorkOrder, Employee, FunctionGroup, FunctionType, [Timestamp])
    values
    ('WO1','Emp1','Group1','Start','7/27/23 09:00'),
    ('WO1','Emp1','Group1','Stop','7/27/23 10:00'),
    ('WO1','Emp1','Group1','Start','7/27/23 11:00'),
    ('WO1','Emp1','Group1','Stop','7/27/23 12:00'),
    ('WO2','Emp2','Group2','Start','7/27/23 13:00'),
    ('WO2','Emp2','Group2','Stop','7/27/23 14:00'),
    ('WO2','Emp2','Group2','Start','7/27/23 15:00'),
    ('WO2','Emp2','Group2','Stop','7/27/23 16:00'),
    ('WO3','Emp3','Group3','Start','7/27/23 17:00'),-- (problem here: since the next row is also a Start, then this row should be returned)
    ('WO3','Emp3','Group3','Start','7/27/23 18:00'),
    ('WO3','Emp3','Group3','Start','7/27/23 19:00'),
    ('WO3','Emp3','Group3','Stop','7/27/23 20:00'),
    ('WO4','Emp4','Group4','Stop','7/27/23 17:00'),-- (problem here: since the dataset for this partition starts with a Stop instead of a Start, then this row should be returned)
    ('WO4','Emp4','Group4','Start','7/27/23 18:00'),
    ('WO4','Emp4','Group4','Start','7/27/23 19:00'),
    ('WO4','Emp4','Group4','Stop','7/27/23 20:00');
    
    with cte as (
      select *
        , row_number() over (partition by WorkOrder, Employee, FunctionGroup order by [Timestamp]) rn
        , lead(FunctionType) over (partition by WorkOrder, Employee, FunctionGroup order by [Timestamp]) FunctionTypeLead
      from @TestData
    )
    select WorkOrder, Employee, FunctionGroup, FunctionType, [Timestamp]
    from cte
    where rn%2 = 1 and FunctionTypeLead != 'Stop'
    order by WorkOrder, Employee, FunctionGroup, [Timestamp];
    

    Returns:

    WorkOrder Employee FunctionGroup FunctionType Timestamp
    WO3 Emp3 Group3 Start 2023-07-27 17:00:00.000
    WO4 Emp4 Group4 Stop 2023-07-27 17:00:00.000

    Note: Providing the DDL+DML (as shown here) makes it much easier to answer.