Search code examples
sqlazureviewlogic

SQL: How to create a daily view based on different time intervals using SQL logic?


Here is an example:

Id|price|Date
1|2|2022-05-21
1|3|2022-06-15
1|2.5|2022-06-19

Needs to look like this:

Id|Date|price
1|2022-05-21|2
1|2022-05-22|2
1|2022-05-23|2
...
1|2022-06-15|3
1|2022-06-16|3
1|2022-06-17|3
1|2022-06-18|3
1|2022-06-19|2.5
1|2022-06-20|2.5
... 
Until today 
1|2022-08-30|2.5

I tried using the lag(price) over (partition by id order by date) But i can't get it right.


Solution

  • I'm not familiar with Azure, but it looks like you need to use a calendar table, or generate missing dates using a recursive CTE.

    To get started with a recursive CTE, you can generate line numbers for each id (assuming multiple id values) in the source data ordered by date. These rows with row number equal to 1 (with the minimum date value for the corresponding id) will be used as the starting point for the recursion. Then you can use the DATEADD function to generate the row for the next day. To use the price values ​​from the original data, you can use a subquery to get the price for this new date, and if there is no such value (no row for this date), use the previous price value from CTE (use the COALESCE function for this).

    For SQL Server query can look like this

    WITH cte AS (
        SELECT 
            id,
            date,
            price
        FROM (
            SELECT
                *,
                ROW_NUMBER() OVER (PARTITION BY id ORDER BY date) AS rn
            FROM tbl
        ) t
        WHERE rn = 1
        UNION ALL
        SELECT 
            cte.id,
            DATEADD(d, 1, cte.date),
            COALESCE(
                (SELECT tbl.price 
                FROM tbl 
                WHERE tbl.id = cte.id AND tbl.date = DATEADD(d, 1, cte.date)), 
                cte.price
            )
        FROM cte
        WHERE DATEADD(d, 1, cte.date) <= GETDATE()
    )
    SELECT * FROM cte
    ORDER BY id, date
    OPTION (MAXRECURSION 0)
    

    Note that I added OPTION (MAXRECURSION 0) to make the recursion run through all the steps, since the default value is 100, this is not enough to complete the recursion.

    db<>fiddle here


    The same approach for MySQL (you need MySQL of version 8.0 to use CTE)

    WITH RECURSIVE cte AS (
        SELECT 
            id,
            date,
            price
        FROM (
            SELECT
                *,
                ROW_NUMBER() OVER (PARTITION BY id ORDER BY date) AS rn
            FROM tbl
        ) t
        WHERE rn = 1
        UNION ALL
        SELECT 
            cte.id,
            DATE_ADD(cte.date, interval 1 day),
            COALESCE(
                (SELECT tbl.price 
                FROM tbl 
                WHERE tbl.id = cte.id AND tbl.date = DATE_ADD(cte.date, interval 1 day)), 
                cte.price
            )
        FROM cte
        WHERE DATE_ADD(cte.date, interval 1 day) <= NOW()
    )
    SELECT * FROM cte
    ORDER BY id, date
    

    db<>fiddle here


    Both queries produces the same results, the only difference is the use of the engine's specific date functions.


    For MySQL versions below 8.0, you can use a calendar table since you don't have CTE support and can't generate the required date range.

    Assuming there is a column in the calendar table to store date values ​​(let's call it date for simplicity) you can use the CROSS JOIN operator to generate date ranges for the id values in your table that will match existing dates. Then you can use a subquery to get the latest price value from the table which is stored for the corresponding date or before it.

    So the query would be like this

    SELECT
       d.id,
       d.date,
       (SELECT
            price
        FROM tbl
        WHERE tbl.id = d.id AND tbl.date <= d.date
        ORDER BY tbl.date DESC
        LIMIT 1
        ) price
    FROM (
        SELECT 
            t.id,
            c.date
        FROM calendar c
        CROSS JOIN (SELECT DISTINCT id FROM tbl) t
        WHERE c.date BETWEEN (
            SELECT
                MIN(date) min_date
            FROM tbl
            WHERE tbl.id = t.id
        )
        AND NOW()
    ) d
    ORDER BY id, date
    

    Using my pseudo-calendar table with date values ranging from 2022-05-20 to 2022-05-30 and source data in that range, like so

    id price date
    1 2 2022-05-21
    1 3 2022-05-25
    1 2.5 2022-05-28
    2 10 2022-05-25
    2 100 2022-05-30

    the query produces following results

    id date price
    1 2022-05-21 2
    1 2022-05-22 2
    1 2022-05-23 2
    1 2022-05-24 2
    1 2022-05-25 3
    1 2022-05-26 3
    1 2022-05-27 3
    1 2022-05-28 2.5
    1 2022-05-29 2.5
    1 2022-05-30 2.5
    2 2022-05-25 10
    2 2022-05-26 10
    2 2022-05-27 10
    2 2022-05-28 10
    2 2022-05-29 10
    2 2022-05-30 100

    db<>fiddle here