Search code examples
sqlsql-server

SQL query of specific rows based on comparison of multiple date columns


I need to query rows out of a single table based on a comparison of multiple columns and multiple dates and group the results by a specific column.

The table has this basic structure:

engine read_date cyl1_read cyl2_read cyl1_last_maint_date cyl2_last_maint_date
1 06/31/2023 25 21 07/15/2023 05/25/2023
1 11/01/2023 27 19 07/15/2023 05/25/2023
2 04/11/2023 37 18 05/15/2023 05/25/2023
2 08/27/2023 19 12 06/01/2023

For each engine, I need to query the cyl1_read and cyl2_read values that were read after the most recent corresponding cyl last maintenance date, if the corresponding cyl last maint date exist, otherwise get all the cyl read values.

So, for this example the result would be:

engine read_date cyl1_read cyl2_read
1 06/31/2023 21
1 11/01/2023 27 19
2 08/27/2023 19 12

Since the cyl1_last_maint_date for engine 1 was on 7/15/2023, the cyl1_read that was read on 06/31/2023 would not be included, but the cyl2_read value that was read on 6/31/2023 would be included because the cyl2_last_maint_date (05/25/2023) was before the read date. The engine 2 values captured on 04/11/2023 would not be included because they were read before the last maint dates for both cyl1 and cyl2. The last engine 2 values read on 08/27/2023 would be included because there is no last maint date for cyl1 and the cyl2 last maint date is before the cyl2 last read date.

How can I accomplish this SQL query? I am running SQL Server 2016


Solution

  • I think this is easiest if you take a 2-step process

    • First, 'remove' any future reads from each row
    • Second, report on rows that have at least one value for the reads

    It is possible to do this in one step, but I think it's much cleaner and clearer to do it in two.

    Assuming the data is in the table #EngineData, The following SQL fragment only includes non-future reads

    SELECT  engine, read_date, 
            CASE WHEN cyl1_last_maint_date <= read_date OR cyl1_last_maint_date IS NULL THEN cyl1_read ELSE NULL END AS cyl1_last_read,
            CASE WHEN cyl2_last_maint_date <= read_date OR cyl2_last_maint_date IS NULL THEN cyl2_read ELSE NULL END AS cyl2_last_read
    FROM    #EngineData;
    

    Results

    engine      read_date  cyl1_last_read cyl2_last_read
    ----------- ---------- -------------- --------------
    1           2023-06-30 NULL           21
    1           2023-11-01 27             19
    2           2023-04-11 NULL           NULL
    2           2023-08-27 19             12
    

    Then we can simply select from this where the last read for either cyl1 or cyl2 have a value (e.g., both not NULL). The initial step is in the CTE below, and the WHERE clause then removes the rows with no reads.

    WITH PrevOnly_EngineData AS
            (SELECT engine, read_date, 
                    CASE WHEN cyl1_last_maint_date <= read_date OR cyl1_last_maint_date IS NULL THEN cyl1_read ELSE NULL END AS cyl1_last_read,
                    CASE WHEN cyl2_last_maint_date <= read_date OR cyl2_last_maint_date IS NULL THEN cyl2_read ELSE NULL END AS cyl2_last_read
            FROM    #EngineData
            )
        SELECT  *
        FROM    PrevOnly_EngineData
        WHERE   cyl1_last_read IS NOT NULL
                OR cyl2_last_read IS NOT NULL
        ORDER BY engine, read_date;
    

    Results

    engine      read_date  cyl1_last_read cyl2_last_read
    ----------- ---------- -------------- --------------
    1           2023-06-30 NULL           21
    1           2023-11-01 27             19
    2           2023-08-27 19             12
    

    You can see these on the following db<>fiddle. (Note - I replaced date 2023-06-31 with 2023-06-30).