I have a table that continues Client IDs and records a score based on their result. What I'm struggling with is I need to show the scores by month, but carry over scores from previous months where the Client didn't complete a Test in the month that has passed or the current month.
For example, ClientID 1 did Tests in Nov 2023, Jan 2024, but didn't in Dec 2023, Feb 2024 or March 2024. So the query needs to carry the value across from Nov 2023 into Dec 2023 and then the value from Jan 2024 into February 2024 and March 2024.
I'm really unsure how to go about this. Does anyone have any ideas?
Create Table #temp
(
ClientID int,
DateCreated datetime,
Score decimal(18,2)
)
insert into #temp
(
ClientID,
DateCreated,
Score
)
select
1,
'01 Nov 2023',
56
union all
select
1,
'15 Jan 2024',
90
union all
select
2,
'08 Dec 2023',
76
union all
select
2,
'25 Jan 2024',
98
union all
select
2,
'01 Mar 2024',
23
Expected Results
ClientID | DateCreated | Score | Comments |
---|---|---|---|
1 | 2023-11-01 00:00:00.000 | 56.00 | |
1 | 2023-12-01 00:00:00.000 | 56.00 | (auto-generated) |
1 | 2024-01-15 00:00:00.000 | 90.00 | |
1 | 2024-02-01 00:00:00.000 | 90.00 | (auto-generated) |
1 | 2024-03-01 00:00:00.000 | 90.00 | (auto-generated) |
2 | 2023-12-08 00:00:00.000 | 76.00 | |
2 | 2024-01-25 00:00:00.000 | 98.00 | |
2 | 2024-02-01 00:00:00.000 | 98.00 | (auto-generated) |
2 | 2024-03-01 00:00:00.000 | 23.00 | (auto-generated) |
This is not like this question: Get the last non null value? because I don't have a NULL values for the Dates. The original Dates don't actually exist. That question is also four years old and the accepted answer mentioned there are better ways of doing it with new functionality coming.
It looks like your problem has two parts:
The first part can be done by generating a list or all months of interest (a recursive CTE is one technique) plus a distinct list of all client IDs. You can then cross join the two sources and apply a WHERE NOT EXISTS(...)
condition to exclude those for which data already exists.
You can then use the CROSS APPLY(SELECT TOP 1 ... ORDER BY ...)
pattern to look up the "best matching" (most recent prior) data row to retrieve that score. That will give you the desired generated rows.
This can then be combined with your original data using UNION ALL
to get your final result.
The completed query would be something like:
WITH Months AS (
SELECT DATEADD(month, DATEDIFF(month, 0,
(SELECT MIN(T.DateCreated) FROM #Temp T)
), 0) AS Month -- Tricky, equivalent of DATETRUNC(month, ...)
UNION ALL
SELECT DATEADD(month, 1, M.Month) AS Month
FROM Months M
WHERE DATEADD(month, 1, M.Month) < GETDATE()
),
Clients AS (
SELECT DISTINCT T.ClientId
FROM #Temp T
)
SELECT T.ClientID, M.Month AS DateCreated, T.Score, Comments = '(auto-generated)'
FROM Months M
CROSS APPLY Clients C
CROSS APPLY (
SELECT TOP 1 *
FROM #Temp T
WHERE T.ClientId = C.ClientId
AND T.DateCreated < M.Month -- Prior
ORDER BY T.DateCreated DESC -- Most Recent
) T
WHERE NOT EXISTS (
-- Row does not exists for current month and client
SELECT *
FROM #Temp T
WHERE T.ClientId = C.ClientId
AND T.DateCreated >= M.Month
AND T.DateCreated < DATEADD(month, 1, M.Month)
)
UNION ALL
SELECT T.ClientID, T.DateCreated, T.Score, Comments = ''
FROM #Temp T
ORDER BY CLientId, DateCreated
Results:
ClientID | DateCreated | Score | Comments |
---|---|---|---|
1 | 2023-11-01 00:00:00.000 | 56.00 | |
1 | 2023-12-01 00:00:00.000 | 56.00 | (auto-generated) |
1 | 2024-01-15 00:00:00.000 | 90.00 | |
1 | 2024-02-01 00:00:00.000 | 90.00 | (auto-generated) |
1 | 2024-03-01 00:00:00.000 | 90.00 | (auto-generated) |
2 | 2023-12-08 00:00:00.000 | 76.00 | |
2 | 2024-01-25 00:00:00.000 | 98.00 | |
2 | 2024-02-01 00:00:00.000 | 98.00 | (auto-generated) |
2 | 2024-03-01 00:00:00.000 | 23.00 |
See this db<>fiddle (for SQL Server 2017) for a demo.
The above works for SQL Server 2017. If we were using later versions of SQL Server, we could use the DATE_TRUNC()
function and perhaps a form of GENERATE_SERIES()
to simplify the month-range CTE.