I am using the following query to return the percentage difference between this month and last month for a given Site ID.
SELECT
reporting_month,
total_revenue,
invoice_count,
--total_revenue_prev,
--invoice_count_prev,
ROUND(SAFE_DIVIDE(total_revenue,total_revenue_prev)-1,4) AS actual_growth,
site_name
FROM (
SELECT DATE_TRUNC(table.date, MONTH) AS reporting_month,
ROUND(SUM(table.revenue),2) AS total_revenue,
COUNT(*) AS invoice_count,
ROUND(IFNULL(
LAG(SUM(table.revenue)) OVER (ORDER BY MIN(DATE_TRUNC(table.date, MONTH))) ,
0),2) AS total_revenue_prev,
IFNULL(
LAG(COUNT(*)) OVER (ORDER BY MIN(DATE_TRUNC(table.date, MONTH))) ,
0) AS invoice_count_prev,
tbl_sites.name AS site_name
FROM table
LEFT JOIN tbl_sites ON tbl_sites.id = table.site
WHERE table.site = '123'
GROUP BY site_name, reporting_month
ORDER BY reporting_month
)
This is working correctly, printing:
reporting_month | total revenue | invoice_count | actual_growth | site_name |
---|---|---|---|---|
2020-11-01 00:00:00 UTC | 100.00 | 10 | 0.571 | SiteNameString |
2020-12-01 00:00:00 UTC | 125.00 | 7 | 0.2500 | SiteNameString |
However I would like to be able to run the same query for all sites. When I remove WHERE table.site = '123'
from the subquery, I assume it is the use of LAG
that is making the numbers report incorrectly. Is there a way to restrict the LAG to the 'current' row site?
You can simply add a PARTITION BY clause in your LAG statement to define a window function :
LAG(SUM(table.revenue)) OVER (PARTITION BY table.site ORDER BY table.date, MONTH)
Here is the related BigQuery documentation page
"PARTITION BY: Breaks up the input rows into separate partitions, over which the analytic function is independently evaluated."