BigQuery SQL running totals

Any idea how to calculate running total in BigQuery SQL?

id   value   running total
--   -----   -------------
1    1       1
2    2       3
3    4       7
4    7       14
5    9       23
6    12      35
7    13      48
8    16      64
9    22      86
10   42      128
11   57      185
12   58      243
13   59      302
14   60      362

Not a problem for traditional SQL servers using either correlated scalar query:

SELECT a.id, a.value, (SELECT SUM(b.value)
                       FROM RunTotalTestData b
                       WHERE b.id <= a.id)
FROM   RunTotalTestData a
ORDER BY a.id;

or join:

SELECT a.id, a.value, SUM(b.Value)
FROM   RunTotalTestData a,
       RunTotalTestData b
WHERE b.id <= a.id
GROUP BY a.id, a.value
ORDER BY a.id;

But I couldn't find a way to make it work in BigQuery...

Solution

You probably figured it out already. But here is one, not the most efficient, way:

JOIN can only be done using equality comparisons i.e. b.id <= a.id cannot be used.

https://developers.google.com/bigquery/docs/query-reference#joins

This is pretty lame if you ask me. But there is one work around. Just use equality comparison on some dummy value to get the cartesian product and then use WHERE for <=. This is crazily suboptimal. But if your tables are small this is going to work.

SELECT a.id, SUM(a.value) as rt 
FROM RunTotalTestData a 
JOIN RunTotalTestData b ON a.dummy = b.dummy 
WHERE b.id <= a.id 
GROUP BY a.id 
ORDER BY rt

You can manually constrain the time as well:

SELECT a.id, SUM(a.value) as rt 
FROM (
    SELECT id, timestamp RunTotalTestData 
    WHERE timestamp >= foo 
    AND timestamp < bar
) AS a 
JOIN (
    SELECT id, timestamp, value RunTotalTestData 
    WHERE timestamp >= foo AND timestamp < bar
) b ON a.dummy = b.dummy 
WHERE b.id <= a.id 
GROUP BY a.id 
ORDER BY rt

Update:

You don't need a special property. You can just use

SELECT 1 AS one

and join on that.

As billing goes the join table counts in the processing.