Search code examples
sqlgoogle-cloud-platformgoogle-bigquery

Add Date column base on other columns in Bigquery


I'm using Bigquery and trying to find a way to add a new date column based on a date that already exists. In the example below, the first record starts on 2024-01-04 (DeliveryDate). Which means the cost for this day was 178086.0 and so on.

Input Data

DeliveryDate    Month   PlanSystemId    cost
2024-01-04     2024-01  92059548         178086.0
2024-01-06     2024-01  92059548         165132.0
2024-01-07     2024-01  92059548         180159.0
2024-01-09     2024-01  92059548         178991.0

I need to add a Date column that starts at the start of the month (2024-01) and put 0 in the cost column because we haven't received any data from those dates.

Output Data

Date        DeliveryDate    Month   PlanSystemId    cost
2024-01-01   null          2024-01  92059548         0
2024-01-02   null          2024-01  92059548         0
2024-01-03   null          2024-01  92059548         0
2024-01-04   2024-01-04    2024-01  92059548         178086.0
2024-01-05   null          2024-01  92059548         0
2024-01-06   2024-01-06    2024-01  92059548         165132.0
2024-01-07   2024-01-07    2024-01  92059548         180159.0
2024-01-08   null          2024-01  92059548         0
2024-01-09   2024-01-09    2024-01  92059548         178991.0
2024-01-10   null          2024-01  92059548         0
.....
2024-01-31   null          2024-01  92059548         0

I did a CTE that covers all days of the month, then I do a left join with the source table to get all the dates, but I can't seem to populate any other columns with data. I hope you can help me find a solution.

Thanks in advance


Solution

  • One canonical way to handle this problem is to use a calendar table, which is along the lines of what you were already doing. Consider the following approach:

    WITH dates AS (
        SELECT dt
        FROM UNNEST(GENERATE_DATE_ARRAY('2024-01-01', '2024-01-31', INTERVAL 1 DAY)) AS dt
    ),
    systems AS (
        SELECT DISTINCT PlanSystemId
        FROM yourTable
    )
    
    SELECT
        d.dt,
        t.DeliveryDate,
        PARSE_DATE('%Y-%m', d.dt) AS Month,
        s.PlanSystemId,
        COALESCE(t.cost, 0) AS cost
    FROM dates d
    CROSS JOIN systems s
    LEFT JOIN yourTable t
        ON t.DeliveryDate = d.dt AND
           t.PlanSystemId = s.PlanSystemId
    ORDER BY
        s.PlanSystemId,
        d.dt;
    

    In the above query, we generate a set of all dates desired to appear in the report. Here I am simply including all dates in January, 2024, but you may extend this as needed. We also generate a set of all distinct PlanSystemId values. In the case of both CTEs, you may choose to replace them with bona fide tables as needed. Then we left join to your current table to generate the desired report.