Search code examples
sqlsql-servergreatest-n-per-groupwindow-functionsgaps-and-islands

Compensating for NULL values in SQL Queries


Let's say I have two tables of data that looks like this:

DvTestResults

DVTR_DeviceNo DVTR_TestedOnAt DVTR_TesterNo
DV00001 2022-08-11 14:15:16.000 0001
DV00001 2022-08-19 21:08:16.000 NULL
DV00001 2022-09-22 08:14:32.000 NULL
DV00002 2023-06-03 18:18:03.000 NULL
DV00002 2023-08-15 19:01:36.000 0007
DV00003 2022-12-23 08:04:47.000 0014
DV00003 2023-01-03 10:09:51.000 0014
DV00003 2023-01-09 08:01:33.000 0014
DV00004 2023-03-14 11:49:02.000 0298
DV00004 2023-03-15 09:08:13.000 0298
DV00005 2022-04-28 16:23:14.000 NULL
DV00005 2022-08-14 08:20:56.000 NULL

Tester

T_TesterNo T_TesterName
0001 John
0007 Stacy
0014 James
0298 Carlos

I want to find the last time each device was tested and who tested it, ordered by device number without any device numbers repeating.

I have the following code:

SELECT DvTestResults.DVTR_DeviceNo, max.LastTimeTested, Tester.T_TesterName 
FROM DvTestResults
INNER JOIN
(
    SELECT DVTR_DeviceNo, MAX(DVTR_TestedOnAt) as LastTimeTested 
    FROM DvTestResults
    GROUP BY DVTR_DeviceNo
) as max
    on max.DVTR_DeviceNo = DvTestResults.DVTR_DeviceNo and max.LastTimeTested = DvTestResults.DVTR_TestedOnAt
INNER JOIN Tester ON Tester.TesterNo = DvTestResults.DVTR_TesterNo
ORDER BY DvTestResults.DVTR_DeviceNo

Unfortunately, while the code works fine for units where the last test's DVTR_TesterNo is not NULL, it does not give values for when it is. What would be a good solution if I wanted to have a list of the last times a device number was tested and who tested even if it came up NULL. Preferably, like with DV00001 in this example, the tester is typically the same name of the last person who tested it (in this case, John) — even though they didn't log in to do the last test that DV00001 did. So, for this example, I'd like an output for DV00001 of:

DVTR_DeviceNo LastTimeTested T_TesterName
DV00001 2022-09-22 08:14:32.000 John

For a device like DV00005, which has only ever been tested anonymously, I'd like an output of:

DVTR_DeviceNo LastTimeTested T_TesterName
DV00005 2022-08-14 08:20:56.000 Anonymous

Can anyone help?


Solution

  • Getting the latest test per device is a typical top-1-per-group problem, which we can approach with row_number() and filtering:

    select *
    from (
        select r.*,
            row_number() over(partition by DVTR_DeviceNo order by DVTR_TestedOnAt desc) rn
        from DvTestResults r
    ) r
    where rn = 1
    

    We would then left join on the testers table to try and bring the tester’s name.

    Preferably, like with DV00001 in this example, the tester is typically the same name of the last person who tested it (in this case, John)

    Retrieving the latest tester (hence ignoring null values) is a more complex task in SQL Server. We can either use apply, or more window functions. The latter would be:

    select r.DVTR_DeviceNo, r.DVTR_TestedOnAt, coalesce(t.T_TesterName, 'Anonymous') T_TesterName
    from (
        select r.*,
            max(DVTR_TesterNo) over(partition by DVTR_DeviceNo, grp) LastDVTR_TesterNo
        from (
            select r.*,
                row_number()         over(partition by DVTR_DeviceNo order by DVTR_TestedOnAt desc) rn,
                count(DVTR_TesterNo) over(partition by DVTR_DeviceNo order by DVTR_TestedOnAt) grp
            from DvTestResults r
        ) r
    ) r
    left join Tester t on t.T_TesterNo = r.LastDVTR_TesterNo
    where r.rn = 1
    

    Related: How to make LAG() ignore NULLs in SQL Server

    Demo, based on the test data created by @marcothesane:

    DVTR_DeviceNo DVTR_TestedOnAt T_TesterName
    DV00001 2022-09-22 08:14:32.000 John
    DV00002 2023-08-15 19:01:36.000 Stacy
    DV00003 2023-01-09 08:01:33.000 James
    DV00004 2023-03-15 09:08:13.000 Carlos
    DV00005 2022-08-14 08:20:56.000 Anonymous