Search code examples
sqlsql-serverdate

Join tables by date ON less than or equal


I want to calculate users' profits based on their daily balance.

Sample data per user: (vTnxDaily)

DateOfDay Balance
2024-02-20 $250
2024-02-12 $80
2024-02-01 $420

Expected output:

DateOfDay Profit
2024-02-29 $250 / 3
2024-02-28 $250 / 3
... ...
2024-02-20 $250 / 3
2024-02-19 $80 / 3
... ...
2024-02-12 $80 / 3
2024-02-11 $420 / 3
... ...
2024-02-01 $420 / 3

Since users do not have daily transactions, gaps must be filled with the latest available data. For instance, if a user only has transactions on Feb 1 and Feb 20, then all days below 20 are calculated based on the last data, which is Feb 1. And for days above 20, the last date used is Feb 20.

Try to calculate all days

Using this question, all days in a month are produced. When I try to join, I cannot eliminate (get rid of) either duplicate or null values.

Try 1

-- produce all days
Declare @year int = 2024, @month int = 2;
WITH numbers
as
(
    Select 1 as value
    UNion ALL
    Select value + 1 from numbers
    where value + 1 <= Day(EOMONTH(datefromparts(@year,@month,1)))
)
SELECT datefromparts(@year,@month,numbers.value) Datum 
into #t
FROM numbers

-- join by the daily view
select t.Datum,v.*
from #t t left join
    vTnxDaily v on t.Datum=v.DateOfDay and v.UserId='user id'

drop table #t

Wrong result (1)

It produces null rows for missed days

DateOfDay Profit
2024-02-29 null
2024-02-28 null
... ...
2024-02-20 $250 / 3
2024-02-19 null
... ...
2024-02-12 $80 / 3
2024-02-11 null
... ...
2024-02-01 $420 / 3

Try 2

To remove null values, I manipulated the join and used <= but it raises another problem: duplicate rows.

select t.Datum,v.*
from #t t left join
    vTnxDaily v on t.Datum>=v.DateOfDay and v.UserId='user id'

Wrong result (2)

It produces duplicate (or more) rows for higher days

DateOfDay Profit
2024-02-29 $250 / 3
2024-02-29 $80 / 3
2024-02-29 $420 / 3
2024-02-28 $250 / 3
... ...
2024-02-20 $250 / 3
2024-02-20 $80 / 3
2024-02-20 $420 / 3
2024-02-19 $80 / 3
2024-02-19 $420 / 3
... ...
2024-02-12 $80 / 3
2024-02-12 $420 / 3
2024-02-11 $420 / 3
... ...
2024-02-01 $420 / 3

Finally

The second query produces the correct value, but duplicate rows must be removed. I also have tried to use UNION and some other options, but I failed.

Reproduce

MS SQL Server 2019, Live Sample on DB Fiddle


Solution

  • Assuming you are on the latest version of SQL Server (2022 at time of writing), you can use GENERATE_SERIES to get all the dates you need (an rCTE is not the most performant way), and then use the IGNORE NULLS feature added to functions like LAST_VALUE to get the last non-NULL value:

    SELECT *
    INTO dbo.YourTable
    FROM (VALUES(CONVERT(date,'2024-02-20'), CONVERT(decimal(18,4),250)),
                (CONVERT(date,'2024-02-12'), CONVERT(decimal(18,4),80)),
                (CONVERT(date,'2024-02-01'), CONVERT(decimal(18,4),420)))V(DateOfDay,Balance);
    
    GO
    WITH DateBoundaries AS (
        SELECT MIN(DateOfDay) AS StartDate,
               EOMONTH(GETDATE()) AS EndDate
        FROM dbo.YourTable),
    Dates AS(
        SELECT DATEADD(DAY, GS.value,DB.StartDate) AS CalendarDate
        FROM DateBoundaries DB
             CROSS APPLY GENERATE_SERIES(0,DATEDIFF(DAY,DB.StartDate,DB.EndDate)) GS)
    SELECT D.CalendarDate,
           LAST_VALUE(YT.Balance) IGNORE NULLS OVER (ORDER BY D.CalendarDate) AS Profit
    FROM Dates D
         LEFT JOIN dbo.YourTable YT ON D.CalendarDate = YT.DateOfDay
    ORDER BY D.CalendarDate DESC;
    
    GO
    
    DROP TABLE dbo.YourTable;
    

    If you aren't on SQL Server 2022+, you'll need to be more "imaginative". I use a Tally here to create the dates instead, (which again is more performant than an rCTE). Then an APPLY to get the "latest" balance for said date:

    WITH DateBoundaries AS (
        SELECT MIN(DateOfDay) AS StartDate,
               EOMONTH(GETDATE()) AS EndDate
        FROM dbo.YourTable),
    N AS(
        SELECT N
        FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL))N(N)),
    Tally AS(
        SELECT 0 AS I
        UNION ALL
        SELECT TOP (SELECT DATEDIFF(DAY,DB.StartDate,DB.EndDate) FROM DateBoundaries DB)
               ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
        FROM N N1, N N2,N N3), --UP to 1,000 days
    Dates AS(
        SELECT DATEADD(DAY, T.I,DB.StartDate) AS CalendarDate
        FROM DateBoundaries DB
             CROSS JOIN Tally T)
    SELECT D.CalendarDate,
           B.Balance AS Profit
    FROM Dates D
         OUTER APPLY (SELECT TOP (1) YT.Balance
                      FROM dbo.YourTable YT
                      WHERE YT.DateOfDay <= D.CalendarDate
                      ORDER BY YT.DateOfDay DESC) B
    ORDER BY D.CalendarDate DESC;
    

    db<>fiddle