I want to calculate users' profits based on their daily balance.
DateOfDay | Balance |
---|---|
2024-02-20 | $250 |
2024-02-12 | $80 |
2024-02-01 | $420 |
DateOfDay | Profit |
---|---|
2024-02-29 | $250 / 3 |
2024-02-28 | $250 / 3 |
... | ... |
2024-02-20 | $250 / 3 |
2024-02-19 | $80 / 3 |
... | ... |
2024-02-12 | $80 / 3 |
2024-02-11 | $420 / 3 |
... | ... |
2024-02-01 | $420 / 3 |
Since users do not have daily transactions, gaps must be filled with the latest available data. For instance, if a user only has transactions on Feb 1 and Feb 20, then all days below 20 are calculated based on the last data, which is Feb 1. And for days above 20, the last date used is Feb 20.
Using this question, all days in a month are produced. When I try to join, I cannot eliminate (get rid of) either duplicate or null values.
-- produce all days
Declare @year int = 2024, @month int = 2;
WITH numbers
as
(
Select 1 as value
UNion ALL
Select value + 1 from numbers
where value + 1 <= Day(EOMONTH(datefromparts(@year,@month,1)))
)
SELECT datefromparts(@year,@month,numbers.value) Datum
into #t
FROM numbers
-- join by the daily view
select t.Datum,v.*
from #t t left join
vTnxDaily v on t.Datum=v.DateOfDay and v.UserId='user id'
drop table #t
It produces null rows for missed days
DateOfDay | Profit |
---|---|
2024-02-29 | null |
2024-02-28 | null |
... | ... |
2024-02-20 | $250 / 3 |
2024-02-19 | null |
... | ... |
2024-02-12 | $80 / 3 |
2024-02-11 | null |
... | ... |
2024-02-01 | $420 / 3 |
To remove null values, I manipulated the join and used <=
but it raises another problem: duplicate rows.
select t.Datum,v.*
from #t t left join
vTnxDaily v on t.Datum>=v.DateOfDay and v.UserId='user id'
It produces duplicate (or more) rows for higher days
DateOfDay | Profit |
---|---|
2024-02-29 | $250 / 3 |
2024-02-29 | $80 / 3 |
2024-02-29 | $420 / 3 |
2024-02-28 | $250 / 3 |
... | ... |
2024-02-20 | $250 / 3 |
2024-02-20 | $80 / 3 |
2024-02-20 | $420 / 3 |
2024-02-19 | $80 / 3 |
2024-02-19 | $420 / 3 |
... | ... |
2024-02-12 | $80 / 3 |
2024-02-12 | $420 / 3 |
2024-02-11 | $420 / 3 |
... | ... |
2024-02-01 | $420 / 3 |
The second query produces the correct value, but duplicate rows must be removed.
I also have tried to use UNION
and some other options, but I failed.
MS SQL Server 2019, Live Sample on DB Fiddle
Assuming you are on the latest version of SQL Server (2022 at time of writing), you can use GENERATE_SERIES
to get all the dates you need (an rCTE is not the most performant way), and then use the IGNORE NULLS
feature added to functions like LAST_VALUE
to get the last non-NULL
value:
SELECT *
INTO dbo.YourTable
FROM (VALUES(CONVERT(date,'2024-02-20'), CONVERT(decimal(18,4),250)),
(CONVERT(date,'2024-02-12'), CONVERT(decimal(18,4),80)),
(CONVERT(date,'2024-02-01'), CONVERT(decimal(18,4),420)))V(DateOfDay,Balance);
GO
WITH DateBoundaries AS (
SELECT MIN(DateOfDay) AS StartDate,
EOMONTH(GETDATE()) AS EndDate
FROM dbo.YourTable),
Dates AS(
SELECT DATEADD(DAY, GS.value,DB.StartDate) AS CalendarDate
FROM DateBoundaries DB
CROSS APPLY GENERATE_SERIES(0,DATEDIFF(DAY,DB.StartDate,DB.EndDate)) GS)
SELECT D.CalendarDate,
LAST_VALUE(YT.Balance) IGNORE NULLS OVER (ORDER BY D.CalendarDate) AS Profit
FROM Dates D
LEFT JOIN dbo.YourTable YT ON D.CalendarDate = YT.DateOfDay
ORDER BY D.CalendarDate DESC;
GO
DROP TABLE dbo.YourTable;
If you aren't on SQL Server 2022+, you'll need to be more "imaginative". I use a Tally here to create the dates instead, (which again is more performant than an rCTE). Then an APPLY
to get the "latest" balance for said date:
WITH DateBoundaries AS (
SELECT MIN(DateOfDay) AS StartDate,
EOMONTH(GETDATE()) AS EndDate
FROM dbo.YourTable),
N AS(
SELECT N
FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL))N(N)),
Tally AS(
SELECT 0 AS I
UNION ALL
SELECT TOP (SELECT DATEDIFF(DAY,DB.StartDate,DB.EndDate) FROM DateBoundaries DB)
ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM N N1, N N2,N N3), --UP to 1,000 days
Dates AS(
SELECT DATEADD(DAY, T.I,DB.StartDate) AS CalendarDate
FROM DateBoundaries DB
CROSS JOIN Tally T)
SELECT D.CalendarDate,
B.Balance AS Profit
FROM Dates D
OUTER APPLY (SELECT TOP (1) YT.Balance
FROM dbo.YourTable YT
WHERE YT.DateOfDay <= D.CalendarDate
ORDER BY YT.DateOfDay DESC) B
ORDER BY D.CalendarDate DESC;