Search code examples
sqlgroup-byconditional-statementscasessms

How to use a conditional to evaluate a paired variable in case statement


I'm working with a dataset (in SSMS) that looks something like this:

CaseID State EventTimeStamp Cleared
1 OrderReceived 2024-03-20 0
1 Completed 2024-03-21 0
1 Completed 2024-03-22 1
2 OrderReceived 2024-03-24 0
2 Completed 2024-03-25 0
2 Completed 2024-03-26 1
2 Completed 2024-03-27 0
3 OrderReceived 2024-03-28 0
3 OrderReceived 2024-03-29 1
3 Completed 2024-03-31 0

EDIT FOR CONTEXT: The actual dataset has ~40 States, but I'd like to track about 25. Some states REQUIRE other states to be marked done, but others don't and some can be in progress simultaneously. Any state can have an event Clear = 1, but it would require a Clear = 0 event first i.e. an event that didn't happen can't be cleared. Events for any state will happen in 0, 1, 0,... Cleared order every single time.

For the purpose of this dummy dataset the MAX(EventTimeStamp) for an OrderReceived could be Cleared = 1, and the MAX(EventTimeStamp) for the Completed event for the same CaseID could be Cleared = 0. There is also potential for a case to have MAX(EventTimeStamp) for OrderReceived to be NULL and MAX(EventTimeStamp) for Completed to have a value. Additionally, it is possible for MAX(EventTimeStamp) for OrderReceived to be AFTER MAX(EventTimeStamp) for Completed. In that case the desired result would simply look like

CaseID OrderReceived Completed
X 2024-03-20 2024-03-18

In that case the last EventTimeStamp would display for each. The engineer who made the database made didn't hardcode a lot of restrictions because a lot of changes are still being made to the workflow.

CREATE TABLE CaseTracking 
(
    [CaseID]    bigint,
    [State]     VARCHAR(512),
    [EventTimeStamp]    date,
    [Cleared]   bit
);

INSERT INTO CaseTracking ([CaseID], [State], EventTimeStamp, Cleared) VALUES
    ('1', 'OrderReceived', '2024-03-20', '0'),
    ('1', 'Completed', '2024-03-21', '0'),
    ('1', 'Completed', '2024-03-22', '1'),
    ('2', 'OrderReceived', '2024-03-24', '0'),
    ('2', 'Completed', '2024-03-25', '0'),
    ('2', 'Completed', '2024-03-26', '1'),
    ('2', 'Completed', '2024-03-27', '0'),
    ('3', 'OrderReceived', '2024-03-28', '0'),
    ('3', 'OrderReceived', '2024-03-29', '1'),
    ('3', 'Completed', '2024-03-31', '0');

SELECT * FROM CaseTracking

I'd like to take a look at how long it takes between certain states and so I organized the table by column State using "group by". I was hoping to make a table that evaluated the MAX(EventTimeStamp) for each state, but if Cleared = 1 then the value would return NULL instead. The desired result would look like this. For this database, states can be cleared and it's recorded the same as any other event, but with Cleared = 1. If the MAX(TimeEventStamp) has a Cleared = 1 value that means it still needs to be done which is why I'd like a NULL return. The value for CaseID = 2 Completed is 2024-3-27 since I'd want the value associated with the MAX(EventTimeStamp). If Cleared happens in 0, 1, 0 order I'd like the timestamp of the last 0. If Cleared happens in 0, 1, 0, 1 order I'd like a NULL return

CaseID OrderReceived Completed
1 2024-03-20 NULL
2 2024-03-24 2024-03-27
3 NULL 2024-03-31

I've been playing around in https://dbfiddle.uk/JG38HeOH

I tried an AND conditional within the CASE clause, but ran into an issue where instead of returning the NULL I want for CaseID = 1, I instead get the MAX(TimeEventStamp) for any time the conditions are met

SELECT CaseID 
  ,MAX(CASE WHEN State = 'OrderReceived' AND Cleared = 0 THEN EventTimeStamp 
            ELSE NULL END) AS OrderReceived
  ,MAX(CASE WHEN State = 'Completed' AND Cleared = 0 THEN EventTimeStamp 
            ELSE NULL END) AS Completed

FROM CaseTracking
GROUP BY CaseID

Results look like this

CaseID OrderReceived Completed
1 2024-03-20 2024-03-21
2 2024-03-24 2024-03-27

Solution

  • This gets the latest row for each state of each case, then pivots the rows to one row per case.

    WITH
      sorted AS
    (
      SELECT
        *,
        ROW_NUMBER()
          OVER (
            PARTITION BY CaseID, State
                ORDER BY EventTimeStamp DESC
          )
            AS row_id
      FROM
        CaseTracking
    )
    SELECT
      CaseID,
      MAX(CASE WHEN State = 'OrderReceived' THEN EventTimeStamp END) AS OrderReceived,
      MAX(CASE WHEN State = 'Completed'     THEN EventTimeStamp END) AS Completed
    FROM
      sorted
    WHERE
      row_id = 1
      AND
      Cleared = 0
    GROUP BY
      CaseID
    
    
    CaseID OrderReceived Completed
    1 2024-03-20 null
    2 2024-03-24 2024-03-27

    fiddle