Search code examples
sqlgroup-bygoogle-bigquerywindow-functions

SQL running total group by id and limited by a condition (over window)


I'm having the following data in an SQL table. See SQL fiddle for playing around.

id pointsEarned createdAt
234-00000206-0 500 2023-05-03T09:05:05.034Z
234-00000206-0 1000 2023-05-12T09:05:05.034Z
234-00000206-0 800 2023-05-15T09:05:05.034Z
234-00000206-0 300 2023-05-21T09:05:05.034Z
234-00000206-0 1100 2023-05-28T09:05:05.034Z

The data is created as follows.

CREATE TABLE activities (
  id varchar(14),
  pointsEarned int,
  createdAt varchar(24)
  );
  
INSERT INTO activities (id, pointsEarned, createdAt)
VALUES ('234-00000206-0', 500, '2023-05-03T09:05:05.034Z');

INSERT INTO activities (id, pointsEarned, createdAt)
VALUES ('234-00000206-0', 1000, '2023-05-12T09:05:05.034Z');

INSERT INTO activities (id, pointsEarned, createdAt)
VALUES ('234-00000206-0', 800, '2023-05-15T09:05:05.034Z');

INSERT INTO activities (id, pointsEarned, createdAt)
VALUES ('234-00000206-0', 300, '2023-05-21T09:05:05.034Z');

INSERT INTO activities (id, pointsEarned, createdAt)
VALUES ('234-00000206-0', 1100, '2023-05-28T09:05:05.034Z');

From that data I'm looking for those IDs that have passed 1.000 points and at what date that has happened. In the sample data this would have been May 12th in the second line already.

I tried this SQL statement, but here the result is of course May, 28th.

SELECT 
  id, 
  SUM(pointsEarned) as points, 
  MAX(createdAt) as lastActivity
FROM 
  activities
GROUP BY
  id
HAVING 
  points > 1000;

Then I was thinking about using window functions and tried this one, but getting an error that the statement is wrong. Any idea how to fix the error (see SQL fiddle link above) and how to limit that query to 1.000 points?

SELECT 
  id,
  SUM(pointsEarned) OVER(ORDER BY createdAt) points
FROM activities;

PS: the SQL fiddle uses MySQL. Finally, I need the statement for Google BigQuery though.


Solution

  • Your code doesn't work in MySQL 5.6 because in that version of MySQL window functions were not yet a feature available. Furthermore, you shouldn't test on MySQL if you need a Google Bigquery solution, because they're two deeply different products with different tools.

    Given these premises, in Google Bigquery you have availability of the QUALIFY clause, that allows you to filter over the result of window functions. You can use qualify twice to check:

    • when the running sum of points is higher than 1000
    • when the date corresponds to the first date of your filtered running sum

    by partitioning on "id" (assuming you can have multiple ids), and ordering on your date.

    WITH cte AS (
        SELECT id, 
               SUM(pointsEarned) OVER(PARTITION BY id ORDER BY createdAt) AS runningPoints, 
               createdAt
        FROM activities
        QUALIFY runningPoints > 1000
    )
    SELECT id, runningPoints, createdAt 
    FROM cte 
    QUALIFY createdAt = MIN(createdAt) OVER(PARTITION BY id ORDER BY runningPoints)
    

    This answer was tested on Bigquery and gave the following output:

    id runningPoints createdAt
    234-00000206-0 1500 2023-05-12T09:05:05.034Z