I'm having the following data in an SQL table. See SQL fiddle for playing around.
id | pointsEarned | createdAt |
---|---|---|
234-00000206-0 | 500 | 2023-05-03T09:05:05.034Z |
234-00000206-0 | 1000 | 2023-05-12T09:05:05.034Z |
234-00000206-0 | 800 | 2023-05-15T09:05:05.034Z |
234-00000206-0 | 300 | 2023-05-21T09:05:05.034Z |
234-00000206-0 | 1100 | 2023-05-28T09:05:05.034Z |
The data is created as follows.
CREATE TABLE activities (
id varchar(14),
pointsEarned int,
createdAt varchar(24)
);
INSERT INTO activities (id, pointsEarned, createdAt)
VALUES ('234-00000206-0', 500, '2023-05-03T09:05:05.034Z');
INSERT INTO activities (id, pointsEarned, createdAt)
VALUES ('234-00000206-0', 1000, '2023-05-12T09:05:05.034Z');
INSERT INTO activities (id, pointsEarned, createdAt)
VALUES ('234-00000206-0', 800, '2023-05-15T09:05:05.034Z');
INSERT INTO activities (id, pointsEarned, createdAt)
VALUES ('234-00000206-0', 300, '2023-05-21T09:05:05.034Z');
INSERT INTO activities (id, pointsEarned, createdAt)
VALUES ('234-00000206-0', 1100, '2023-05-28T09:05:05.034Z');
From that data I'm looking for those IDs that have passed 1.000 points and at what date that has happened. In the sample data this would have been May 12th in the second line already.
I tried this SQL statement, but here the result is of course May, 28th.
SELECT
id,
SUM(pointsEarned) as points,
MAX(createdAt) as lastActivity
FROM
activities
GROUP BY
id
HAVING
points > 1000;
Then I was thinking about using window functions and tried this one, but getting an error that the statement is wrong. Any idea how to fix the error (see SQL fiddle link above) and how to limit that query to 1.000 points?
SELECT
id,
SUM(pointsEarned) OVER(ORDER BY createdAt) points
FROM activities;
PS: the SQL fiddle uses MySQL. Finally, I need the statement for Google BigQuery though.
Your code doesn't work in MySQL 5.6 because in that version of MySQL window functions were not yet a feature available. Furthermore, you shouldn't test on MySQL if you need a Google Bigquery solution, because they're two deeply different products with different tools.
Given these premises, in Google Bigquery you have availability of the QUALIFY
clause, that allows you to filter over the result of window functions. You can use qualify twice to check:
by partitioning on "id" (assuming you can have multiple ids), and ordering on your date.
WITH cte AS (
SELECT id,
SUM(pointsEarned) OVER(PARTITION BY id ORDER BY createdAt) AS runningPoints,
createdAt
FROM activities
QUALIFY runningPoints > 1000
)
SELECT id, runningPoints, createdAt
FROM cte
QUALIFY createdAt = MIN(createdAt) OVER(PARTITION BY id ORDER BY runningPoints)
This answer was tested on Bigquery and gave the following output:
id | runningPoints | createdAt |
---|---|---|
234-00000206-0 | 1500 | 2023-05-12T09:05:05.034Z |