Any idea how to calculate running total in BigQuery SQL?
id value running total
-- ----- -------------
1 1 1
2 2 3
3 4 7
4 7 14
5 9 23
6 12 35
7 13 48
8 16 64
9 22 86
10 42 128
11 57 185
12 58 243
13 59 302
14 60 362
Not a problem for traditional SQL servers using either correlated scalar query:
SELECT a.id, a.value, (SELECT SUM(b.value)
FROM RunTotalTestData b
WHERE b.id <= a.id)
FROM RunTotalTestData a
ORDER BY a.id;
or join:
SELECT a.id, a.value, SUM(b.Value)
FROM RunTotalTestData a,
RunTotalTestData b
WHERE b.id <= a.id
GROUP BY a.id, a.value
ORDER BY a.id;
But I couldn't find a way to make it work in BigQuery...
You probably figured it out already. But here is one, not the most efficient, way:
JOIN can only be done using equality comparisons i.e. b.id <= a.id cannot be used.
https://developers.google.com/bigquery/docs/query-reference#joins
This is pretty lame if you ask me. But there is one work around. Just use equality comparison on some dummy value to get the cartesian product and then use WHERE for <=. This is crazily suboptimal. But if your tables are small this is going to work.
SELECT a.id, SUM(a.value) as rt
FROM RunTotalTestData a
JOIN RunTotalTestData b ON a.dummy = b.dummy
WHERE b.id <= a.id
GROUP BY a.id
ORDER BY rt
You can manually constrain the time as well:
SELECT a.id, SUM(a.value) as rt
FROM (
SELECT id, timestamp RunTotalTestData
WHERE timestamp >= foo
AND timestamp < bar
) AS a
JOIN (
SELECT id, timestamp, value RunTotalTestData
WHERE timestamp >= foo AND timestamp < bar
) b ON a.dummy = b.dummy
WHERE b.id <= a.id
GROUP BY a.id
ORDER BY rt
Update:
You don't need a special property. You can just use
SELECT 1 AS one
and join on that.
As billing goes the join table counts in the processing.