Search code examples
sqlgroup-bygoogle-bigquerycumulative-sumrunning-count

Generate summary with counts of joiners, leavers and running count of users


I have a table with a large number of events that have happened, like the following:

ts event
2023-05-09 19:20:19 UTC joined
2023-01-16 09:34:02 UTC joined
2022-08-19 10:02:44 UTC left
2022-10-06 10:11:12 UTC joined
2021-10-06 10:11:13 UTC left

Each event involves either a "joiner" or a "leaver", with a TIMESTAMP "ts" associated. I need to create a report that gives joiners, leavers and total users for each month. The total users would be a running total of the number of users, like the following:

Year Month Joiners Leavers Total
2023 03 50 0 50
2023 04 5 1 54
2023 05 30 10 74
2023 06 10 5 79

I need to compute:

  • count for "Joiners", namely the number of "joined" events that happened in the month,
  • count for "Leavers", which would be the number of "left" events that happened in the month
  • "Total", as a running total number of joined users minus left users (if 10 people joined and 2 people left, then the actual total number of people would be 8; if 10 more people joined next month and 0 people left then the Total needs to be 18 in the next month)

Here's what I've tried so far:

SELECT EXTRACT(YEAR FROM ts) year,
       EXTRACT(MONTH FROM ts) month,
       event,
       COUNT(event) AS total,
       SUM(COUNT(event)) OVER(ORDER BY event) AS running_total
FROM `data.events`
GROUP BY year, month, event
ORDER BY year ASC, month ASC, event ASC

This is the closest that I've achieved so far with a very basic understanding of SQL, however it's not what I was aiming for. I can't get the COUNT or SUM running total to work.

Any help that could point me in the right direction would be appreciated.


Solution

  • You shall first generate your two aggregated values "Joiners" and "Leavers" with conditional aggregation, then use a running sum on Joiners-Leavers.

    WITH cte AS (
        SELECT EXTRACT(YEAR FROM ts) year,
               EXTRACT(MONTH FROM ts) month,
               SUM(CASE WHEN event = 'joined' THEN 1 ELSE 0 END)  AS Joiners,
               SUM(CASE WHEN event = 'left'   THEN 1 ELSE 0 END)  AS Leavers
        FROM `data.events`
        GROUP BY EXTRACT(YEAR FROM ts),
                 EXTRACT(MONTH FROM ts)
    )
    SELECT cte.*, 
           SUM(Joiners-Leavers) OVER(ORDER BY year, month) AS Total
    FROM cte 
    ORDER BY year, month