Suppose I have two tables: intervals
contains index intervals (its columns are i_min
and i_max
) and values
contains indexed values (with columns i
and x
). Here's an example:
values: intervals:
+---+---+ +-------+-------+
| i | x | | i_min | i_max |
+-------+ +---------------+
| 1 | 1 | | 1 | 4 |
| 2 | 0 | | 6 | 6 |
| 3 | 4 | | 6 | 6 |
| 4 | 9 | | 6 | 6 |
| 6 | 7 | | 7 | 9 |
| 7 | 2 | | 12 | 17 |
| 8 | 2 | +-------+-------+
| 9 | 2 |
+---+---+
I want to sum the values of x for each interval:
result:
+-------+-------+-----+
| i_min | i_max | sum |
+---------------------+
| 1 | 4 | 13 | // 1+0+4+9
| 6 | 6 | 7 |
| 6 | 6 | 7 |
| 6 | 6 | 7 |
| 7 | 9 | 6 | // 2+2+2
| 12 | 17 | 0 |
+-------+-------+-----+
In some SQL engines, this could be done using:
SELECT
i_min,
i_max,
(SELECT SUM(x)
FROM values
WHERE i BETWEEN intervals.i_min AND intervals.i_max) AS sum_x
FROM
intervals
except that type of query is not allowed by BigQuery ("Subselect not allowed in SELECT clause." or "LEFT OUTER JOIN cannot be used without a condition that is an equality of fields from both sides of the join." depending on the syntax used).
There must be a way to do this with window functions, but I can't figure out how — all examples I've seen have the partition as part of the table. Is there an option that doesn't use CROSS JOIN? If not, what's the most efficient way to do this CROSS JOIN?
Some notes on my data:
intervals
, not in i
.intervals
are either the same, either entirely disjoint (no overlaps).i
(so it forms a partition of this space).Try below - BigQuery Standard SQL
#standardSQL
SELECT
i_min, i_max, SUM(x) AS sum_x
FROM (
SELECT i_min, i_max, ROW_NUMBER() OVER() AS line FROM `project.dataset.intervals`
) AS intervals
JOIN (SELECT i, x FROM `project.dataset.values` UNION ALL SELECT NULL, 0) AS values
ON values.i BETWEEN intervals.i_min AND intervals.i_max OR values.i IS NULL
GROUP BY i_min, i_max, line
-- ORDER BY i_min
you can play/test with dummy data as below
#standardSQL
WITH intervals AS (
SELECT 1 AS i_min, 4 AS i_max UNION ALL
SELECT 6, 6 UNION ALL
SELECT 6, 6 UNION ALL
SELECT 6, 6 UNION ALL
SELECT 7, 9 UNION ALL
SELECT 12, 17
),
values AS (
SELECT 1 AS i, 1 AS x UNION ALL
SELECT 2, 0 UNION ALL
SELECT 3, 4 UNION ALL
SELECT 4, 9 UNION ALL
SELECT 6, 7 UNION ALL
SELECT 7, 2 UNION ALL
SELECT 8, 2 UNION ALL
SELECT 9, 2
)
SELECT
i_min, i_max, SUM(x) AS sum_x
FROM (SELECT i_min, i_max, ROW_NUMBER() OVER() AS line FROM intervals) AS intervals
JOIN (SELECT i, x FROM values UNION ALL SELECT NULL, 0) AS values
ON values.i BETWEEN intervals.i_min AND intervals.i_max OR values.i IS NULL
GROUP BY i_min, i_max, line
-- ORDER BY i_min