Search code examples
sqlgoogle-bigquerylag

Restrict LAG to specific row condition


I am using the following query to return the percentage difference between this month and last month for a given Site ID.

SELECT 
reporting_month,
total_revenue,
invoice_count,
--total_revenue_prev,
--invoice_count_prev,
ROUND(SAFE_DIVIDE(total_revenue,total_revenue_prev)-1,4) AS actual_growth,
site_name
FROM (

SELECT DATE_TRUNC(table.date, MONTH) AS reporting_month,
        ROUND(SUM(table.revenue),2) AS total_revenue, 
        COUNT(*) AS invoice_count,
        ROUND(IFNULL(
           LAG(SUM(table.revenue)) OVER (ORDER BY MIN(DATE_TRUNC(table.date, MONTH))) ,
           0),2) AS total_revenue_prev,
        IFNULL(
           LAG(COUNT(*)) OVER (ORDER BY MIN(DATE_TRUNC(table.date, MONTH))) ,
           0) AS invoice_count_prev,
        tbl_sites.name AS site_name
FROM table
LEFT JOIN tbl_sites ON tbl_sites.id = table.site
WHERE table.site = '123'
GROUP BY site_name, reporting_month
ORDER BY reporting_month
)

This is working correctly, printing:

reporting_month total revenue invoice_count actual_growth site_name
2020-11-01 00:00:00 UTC 100.00 10 0.571 SiteNameString
2020-12-01 00:00:00 UTC 125.00 7 0.2500 SiteNameString

However I would like to be able to run the same query for all sites. When I remove WHERE table.site = '123' from the subquery, I assume it is the use of LAG that is making the numbers report incorrectly. Is there a way to restrict the LAG to the 'current' row site?


Solution

  • You can simply add a PARTITION BY clause in your LAG statement to define a window function :

    LAG(SUM(table.revenue)) OVER (PARTITION BY table.site ORDER BY table.date, MONTH)
    

    Here is the related BigQuery documentation page

    "PARTITION BY: Breaks up the input rows into separate partitions, over which the analytic function is independently evaluated."