Search code examples
sqlsql-servert-sqlsql-server-2017

Continue record sequence in blank months until another record appears


I have a table that continues Client IDs and records a score based on their result. What I'm struggling with is I need to show the scores by month, but carry over scores from previous months where the Client didn't complete a Test in the month that has passed or the current month.

For example, ClientID 1 did Tests in Nov 2023, Jan 2024, but didn't in Dec 2023, Feb 2024 or March 2024. So the query needs to carry the value across from Nov 2023 into Dec 2023 and then the value from Jan 2024 into February 2024 and March 2024.

I'm really unsure how to go about this. Does anyone have any ideas?

    Create Table #temp
(
    ClientID int,
    DateCreated datetime,
    Score decimal(18,2)
)

insert into #temp
(
    ClientID,
    DateCreated,
    Score
)
select
    1,
    '01 Nov 2023',
    56
union all
select
    1,
    '15 Jan 2024',
    90
union all
select
    2,
    '08 Dec 2023',
    76
union all
select
    2,
    '25 Jan 2024',
    98
union all 
select
    2,
    '01 Mar 2024',
    23

Expected Results

ClientID DateCreated Score Comments
1 2023-11-01 00:00:00.000 56.00
1 2023-12-01 00:00:00.000 56.00 (auto-generated)
1 2024-01-15 00:00:00.000 90.00
1 2024-02-01 00:00:00.000 90.00 (auto-generated)
1 2024-03-01 00:00:00.000 90.00 (auto-generated)
2 2023-12-08 00:00:00.000 76.00
2 2024-01-25 00:00:00.000 98.00
2 2024-02-01 00:00:00.000 98.00 (auto-generated)
2 2024-03-01 00:00:00.000 23.00 (auto-generated)

This is not like this question: Get the last non null value? because I don't have a NULL values for the Dates. The original Dates don't actually exist. That question is also four years old and the accepted answer mentioned there are better ways of doing it with new functionality coming.


Solution

  • It looks like your problem has two parts:

    1. Generate rows for month/client combinations for which the original table has no data.
    2. Populate the score in those generated rows with the most recent available prior value.

    The first part can be done by generating a list or all months of interest (a recursive CTE is one technique) plus a distinct list of all client IDs. You can then cross join the two sources and apply a WHERE NOT EXISTS(...) condition to exclude those for which data already exists.

    You can then use the CROSS APPLY(SELECT TOP 1 ... ORDER BY ...) pattern to look up the "best matching" (most recent prior) data row to retrieve that score. That will give you the desired generated rows.

    This can then be combined with your original data using UNION ALL to get your final result.

    The completed query would be something like:

    WITH Months AS (
        SELECT DATEADD(month, DATEDIFF(month, 0,
                 (SELECT MIN(T.DateCreated) FROM #Temp T)
                 ), 0) AS Month  -- Tricky, equivalent of DATETRUNC(month, ...) 
        UNION ALL
        SELECT DATEADD(month, 1, M.Month) AS Month
        FROM Months M
        WHERE DATEADD(month, 1, M.Month) < GETDATE()
    ),
    Clients AS (
        SELECT DISTINCT T.ClientId
        FROM #Temp T
    )
    SELECT T.ClientID, M.Month AS DateCreated, T.Score, Comments = '(auto-generated)'
    FROM Months M
    CROSS APPLY Clients C
    CROSS APPLY (
        SELECT TOP 1 *
        FROM #Temp T
        WHERE T.ClientId = C.ClientId
        AND T.DateCreated < M.Month -- Prior
        ORDER BY T.DateCreated DESC -- Most Recent
    ) T
    WHERE NOT EXISTS (
        -- Row does not exists for current month and client
        SELECT *
        FROM #Temp T
        WHERE T.ClientId = C.ClientId
        AND T.DateCreated >= M.Month
        AND T.DateCreated < DATEADD(month, 1, M.Month)
    )
    UNION ALL
    SELECT T.ClientID, T.DateCreated, T.Score, Comments = ''
    FROM #Temp T
    ORDER BY CLientId, DateCreated
    

    Results:

    ClientID DateCreated Score Comments
    1 2023-11-01 00:00:00.000 56.00
    1 2023-12-01 00:00:00.000 56.00 (auto-generated)
    1 2024-01-15 00:00:00.000 90.00
    1 2024-02-01 00:00:00.000 90.00 (auto-generated)
    1 2024-03-01 00:00:00.000 90.00 (auto-generated)
    2 2023-12-08 00:00:00.000 76.00
    2 2024-01-25 00:00:00.000 98.00
    2 2024-02-01 00:00:00.000 98.00 (auto-generated)
    2 2024-03-01 00:00:00.000 23.00

    See this db<>fiddle (for SQL Server 2017) for a demo.

    The above works for SQL Server 2017. If we were using later versions of SQL Server, we could use the DATE_TRUNC() function and perhaps a form of GENERATE_SERIES() to simplify the month-range CTE.