I have a table with a large number of events that have happened, like the following:
ts | event |
---|---|
2023-05-09 19:20:19 UTC | joined |
2023-01-16 09:34:02 UTC | joined |
2022-08-19 10:02:44 UTC | left |
2022-10-06 10:11:12 UTC | joined |
2021-10-06 10:11:13 UTC | left |
Each event involves either a "joiner" or a "leaver", with a TIMESTAMP "ts" associated. I need to create a report that gives joiners, leavers and total users for each month. The total users would be a running total of the number of users, like the following:
Year | Month | Joiners | Leavers | Total |
---|---|---|---|---|
2023 | 03 | 50 | 0 | 50 |
2023 | 04 | 5 | 1 | 54 |
2023 | 05 | 30 | 10 | 74 |
2023 | 06 | 10 | 5 | 79 |
I need to compute:
Here's what I've tried so far:
SELECT EXTRACT(YEAR FROM ts) year,
EXTRACT(MONTH FROM ts) month,
event,
COUNT(event) AS total,
SUM(COUNT(event)) OVER(ORDER BY event) AS running_total
FROM `data.events`
GROUP BY year, month, event
ORDER BY year ASC, month ASC, event ASC
This is the closest that I've achieved so far with a very basic understanding of SQL, however it's not what I was aiming for. I can't get the COUNT or SUM running total to work.
Any help that could point me in the right direction would be appreciated.
You shall first generate your two aggregated values "Joiners" and "Leavers" with conditional aggregation, then use a running sum on Joiners-Leavers
.
WITH cte AS (
SELECT EXTRACT(YEAR FROM ts) year,
EXTRACT(MONTH FROM ts) month,
SUM(CASE WHEN event = 'joined' THEN 1 ELSE 0 END) AS Joiners,
SUM(CASE WHEN event = 'left' THEN 1 ELSE 0 END) AS Leavers
FROM `data.events`
GROUP BY EXTRACT(YEAR FROM ts),
EXTRACT(MONTH FROM ts)
)
SELECT cte.*,
SUM(Joiners-Leavers) OVER(ORDER BY year, month) AS Total
FROM cte
ORDER BY year, month