Search code examples
sqlsql-serversubquerydate-arithmetic

how to add previous rows for all dates based on some ID if data does not exist in SQL


I have a table like this in SQL:

assetId price volume netVolume date transactionDateTime
1 200.00 100 100 2023-05-12 2023-05-12 10:32:10.000
1 200.00 100 200 2023-05-12 2023-05-12 10:50:10.000
3 300.00 100 100 2023-05-15 2023-05-15 11:40:46.000
2 200.00 100 500 2023-05-15 2023-05-15 12:18:01.000
3 300.00 100 100 2023-05-16 2023-05-16 11:50:31.000
2 200.00 100 600 2023-05-16 2023-05-16 12:20:18.000

as you see for some days we may have multiple rows with same assetId (2023-05-12), or we may not have any thing. I want to create a CTE to get a new table with this rules:

  1. start from a @startDate and go to an @endDate, day by day
  2. if there are multiple rows of the same assetId just pick the last one (based on transactionDateTime)
  3. if for a specific date there is no row for an assetId, add the last one existing but change the date to the current date, all other data must remain unchainged. do not add a row if you do not find any previous data. I managed to do the first part using below CTE, but I can not find a way to do the second part:
with CTE AS (
        SELECT assetId, rowNumber, price, volume, netVolume, date, transactionDateTime,
        ROW_NUMBER()OVER(PARTITION BY td.assetId,td.date order by td.transactionDateTime desc) AS RN
        FROM transactiondData td
)
    SELECT assetId, rowNumber, price, volume, netVolume, date, transactionDateTime
    FROM CTE
    WHERE RN = 1 AND date >= @startDate AND date <= @endDate
    ORDER BY transactionDateTime;

at the end I must have a table like this, think @startDate = 2023-05-13 and @endDate = 2023-05-17 :

assetId price volume netVolume date transactionDateTime
1 200.00 100 200 2023-05-13 2023-05-15 10:50:10.000
1 200.00 100 200 2023-05-14 2023-05-15 10:50:10.000
1 200.00 100 200 2023-05-15 2023-05-15 10:50:10.000
3 300.00 100 100 2023-05-15 2023-05-15 11:40:46.000
2 200.00 100 500 2023-05-15 2023-05-15 12:18:01.000
3 300.00 100 100 2023-05-16 2023-05-16 11:50:31.000
1 200.00 100 200 2023-05-16 2023-05-15 10:50:10.000
2 200.00 100 600 2023-05-16 2023-05-16 12:20:18.000
3 300.00 100 100 2023-05-17 2023-05-16 11:50:31.000
1 200.00 100 200 2023-05-17 2023-05-15 10:50:10.000
2 200.00 100 600 2023-05-17 2023-05-16 12:20:18.000

how can I do this?


Solution

  • We can use a a recursive query to generate the date range, then cross join it with the list of assets: this gives us possible combinations. Then, a simple approach to bring the corresponding transaction is apply with top:

    with dates as (
        select @startDate dt
        union all select dateadd(day, 1, dt) from dates where dt < @endDate
    )
    select t.assetId, t.price, t.volume, t.netVolume, d.dt as date, t.transactionDateTime
    from dates d
    cross join (select distinct assetId from transactiondData) a
    cross apply (
        select top (1) t.*
        from transactiondData t
        where t.date <= d.dt and t.assetId = a.assetId
        order by t.transactionDateTime desc
    ) t
    order by d.dt, t.transactionDateTime
    

    Note: since we use cross apply, "missing" rows that have no preceding transactions are skipped, because the subquery returns no rows.

    Demo on DB Fiddle:

    assetId price volume netVolume date transactionDateTime
    1 200.00 100 200 2023-05-13 2023-05-12 10:50:10.000
    1 200.00 100 200 2023-05-14 2023-05-12 10:50:10.000
    1 200.00 100 200 2023-05-15 2023-05-12 10:50:10.000
    3 300.00 100 100 2023-05-15 2023-05-15 11:40:46.000
    2 200.00 100 500 2023-05-15 2023-05-15 12:18:01.000
    1 200.00 100 200 2023-05-16 2023-05-12 10:50:10.000
    3 300.00 100 100 2023-05-16 2023-05-16 11:50:31.000
    2 200.00 100 600 2023-05-16 2023-05-16 12:20:18.000
    1 200.00 100 200 2023-05-17 2023-05-12 10:50:10.000
    3 300.00 100 100 2023-05-17 2023-05-16 11:50:31.000
    2 200.00 100 600 2023-05-17 2023-05-16 12:20:18.000