Search code examples
sql-servert-sql

LAST_VALUE function not working as expected


I am on Microsoft SQL Azure (RTM) - 12.0.2000.8 and I have a table that has a number of dates that a provider has received a rating.

I need to output a row for each provider for each month since Apr-22 with the provider's most recent rating as at that date.

This is the ratings table I have to work with:

Date Provider Rating
2022-04-30 A 1
2022-05-31 B 2
2022-07-31 A 5

I have tried to accomplish this using the LAST_VALUE function, but it doesn't fill this down.

WITH CTE_Dates AS ( --create date table as ratings table does not have record for every month

SELECT
EOMONTH([Date], 0) AS [ReportingDate]
FROM [tbl_Dates]
WHERE [Date] >= '2022-04-01'
AND [DATE] <= '2022-08-31'

GROUP BY EOMONTH([Date], 0)
)
,

CTE_Ratings AS (
SELECT
[ReportingDate]
,[Provider]
,[Rating]

FROM Ratings

)


SELECT
CTE_Dates.[ReportingDate]
,[Provider]
,[Rating]
,LAST_VALUE([Rating]) OVER (PARTITION BY CTE_Dates.[ReportingDate], [Provider] ORDER BY CTE_Dates.[ReportingDate]
        ) AS latest_rating

FROM CTE_Dates
LEFT OUTER JOIN CTE_Ratings
    ON CTE_Dates.[ReportingDate] = CTE_Ratings.[ReportingDate]

My output:

Date Provider Rating latest_rating
2022-04-30 A 1 1
2022-05-31 B 2 2
2022-06-30 NULL NULL NULL
2022-07-31 A 5 5
2022-08-31 NULL NULL NULL

Desired output:

Date Provider Rating latest_rating
2022-04-30 A 1 1
2022-05-31 A NULL 1
2022-06-30 A NULL 1
2022-07-31 A 5 5
2022-08-31 A NULL 5
2022-04-30 B NULL NULL
2022-05-31 B 2 2
2022-06-30 B NULL 2
2022-07-31 B NULL 2
2022-08-31 B NULL 2

Solution

  • The query you have, and the result you say you get are certainly not the same, if we are using the sample data you have given us. As such I've used your data to define a #Ratings table and used by own calendar table.

    Firstly, I get the last day of each month, which can be achieved with EOMONTH, but I don't get every date from the calendar table; the first day is enough, which can easily be returned by filter on your "Calendar Day" column.

    Then, it appears, we need one distinct row per date and provider, something your query is missing. This can be done with a CROSS JOIN between the Dates CTE and the Ratings table, and getting the DISTINCT values for the combination of the 2 columns. Ideally, however, you would CROSS JOIN to a Provider table, to eliminate the DISTINCT.

    Despite the sample data suggesting otherwise, I've assumed that the value of Rating is not always increasing. If it is, then a windowed MAX will work. I demonstrate this in the solution as well, as it would work on SQL Server 2019 and prior.

    Assuming it isn't always ascending, however, what you are missing here is the IGNORE NULLS operator (and the comparison of Provider in the ON), which is available in Azure and SQL Server 2022+. If you add that, you get the results you want, as NULL values are ignored for getting the last value.

    CREATE TABLE #Ratings (Date date,
                           Provider char(1),
                           Rating int);
    GO
    
    INSERT INTO #Ratings (Date,
                          Provider,
                          Rating)
    VALUES('2022-04-30','A',1),
          ('2022-05-31','B',2),
          ('2022-07-31','A',5);
    GO
    
    WITH Dates AS(
        SELECT EOMONTH(CalendarDate) AS [ReportingDate]
        FROM tbl.Calendar
        WHERE CalendarDate >= '2022-04-01'
          AND CalendarDate <= '2022-08-31'
          AND CalendarDay = 1),
    RatingDates AS(
        SELECT DISTINCT 
               R.Provider,
               D.ReportingDate
        FROM #Ratings R
             CROSS JOIN Dates D)
    SELECT RD.[ReportingDate],
           RD.[Provider],
           MAX(R.Rating) OVER (PARTITION BY RD.Provider ORDER BY RD.ReportingDate) AS RatingMax, --Won't work if Rating isn't always ascending
           LAST_VALUE(R.Rating) IGNORE NULLS OVER (PARTITION BY RD.Provider ORDER BY RD.ReportingDate) AS Rating
    FROM RatingDates RD
         LEFT OUTER JOIN #Ratings R ON RD.[ReportingDate] = R.Date
                                   AND RD.Provider = R.Provider
    ORDER BY RD.Provider,
             RD.ReportingDate;
    GO
    
    DROP TABLE #Ratings