Search code examples
sqlgoogle-cloud-platformgoogle-bigquery

Flagging a row in new column based on conditions on previous and current rows


I have a items_invoiced table I have ordered that invoice table based on service id and invoice date so you will see groups of invoice_items that have renew peridically, every month. year. etc.

items_invoiced

invoice_id service_id invoice_date plan_id paid_amount
01 1 2024-03-08 2 30.00
09 1 2023-03-08 1 25.00
08 2 2024-04-08 1 30.00
034 2 2024-03-08 1 25.00
013 3 2024-03-08 2 30.00
010 3 2023-03-08 1 25.00
013 3 2021-03-08 1 30.00
010 3 2020-03-08 1 25.00

I need to find a way to add another column called 'plan_upgrade' that will look into each group of invoices that belong to a service_id and if there's a change in the plan_id (previous plan id es less than current plan_id) it would return TRUE, if not FALSE

items_invoiced_enhanced

invoice_id service_id invoice_date plan_id paid_amount plan_upgrade
01 1 2024-03-08 2 30.00 TRUE
09 1 2023-03-08 1 25.00 FALSE
08 2 2024-04-08 1 30.00 FALSE
034 2 2024-03-08 1 25.00 FALSE
013 3 2024-03-08 2 30.00 TRUE
010 3 2023-03-08 1 25.00 FALSE
013 3 2021-03-08 1 30.00 FALSE
010 3 2020-03-08 1 25.00 FALSE

notice that the table above shows 3 services, ids 1, 2 and 3, the first service gets renewed every year and the plan id changed on the second renewal, the 2nd service gets renewed every month but the plan id did not changed on the renewal, the third gets also renewed every year for the past 4 years and the upgrade (change of plan id from 1 to 2) happened on the last renewal year

is there a way to do this in BQ?

I'm aware thereś a window function, but understanding how it works it's sort of a challenge at my level of of SQl and BQ knowledge, so if someone can explain to me how this would actually be accomplished with a window function or another approach, I´d really appreciate it.


Solution

  • Window functions perform calculations over a set of rows, which is defined by OVER. In your case, service_id and the last invoice_date is the set you want to have the calculation performed on. PARTITION BY says the independent grouping in the row subsets you want the calculation performed on, service_id. ORDER BY gives a ordering that you want the calculation to occur, invoice_date since you want to see what happened on the previous date. The LAG() function looks at the previous row (which is sorted now) and in your case what was the previous plan_id for the last service_date for this particular service_id.

    SQL Fiddle


    -- INIT database
    CREATE TABLE invoices (
        invoice_id VARCHAR(10),
        service_id INT,
        invoice_date DATE,
        plan_id INT,
        paid_amount DECIMAL(10, 2)
    );
    INSERT INTO invoices (invoice_id, service_id, invoice_date, plan_id, paid_amount) VALUES
    ('01', 1, '2024-03-08', 2, 30.00),
    ('09', 1, '2023-03-08', 1, 25.00),
    ('08', 2, '2024-04-08', 1, 30.00),
    ('034', 2, '2024-03-08', 1, 25.00),
    ('013', 3, '2024-03-08', 2, 30.00),
    ('010', 3, '2023-03-08', 1, 25.00),
    ('013', 3, '2021-03-08', 1, 30.00),
    ('010', 3, '2020-03-08', 1, 25.00);
    -- QUERY database
    SELECT 
        invoice_id,
        service_id,
        invoice_date,
        plan_id,
        paid_amount,
        CASE
            WHEN plan_id != LAG(plan_id) OVER (PARTITION BY service_id ORDER BY invoice_date) THEN TRUE
            ELSE FALSE
        END AS plan_upgrade
    FROM invoices
    ORDER BY service_id, invoice_date;