I'm learning BigQuery standard SQL at the moment.
I've followed the examples of how you flatten data with legacy SQL, here and here.
It makes sense that flattening data generates stacked data with similar rows.
However, in the documentation which describes moving from Legacy SQL to Standard SQL, it describes how to do an identical flatten operation. The sub-heading for this is called Removing Repitition with Flatten.
Could someone explain that to me? It seems like flatten generates repetition of data by turning nested data into stacked data and I feel like I'm not understanding something basic.
The "repetition" being removed is that some particular column. Suppose you have data that looks like this, where x
and y
have type INT64
and z
is an ARRAY<INT64>
:
x y z
-----------------
1 2 [3, 4, 5]
6 7 [8, 9]
Flattening in this case means removing the repetition of z
so that you end up only with INT64
column types in the output without an ARRAY
type, which converts the data to the following shape:
x y z
---------
1 2 3
1 2 4
1 2 5
6 7 8
6 7 9
Note that we are actually adding repetition in that the values of x
and y
are being duplicated across rows, but we have flattened the repeated structure of z
. In query form, you can try the following examples. This first one simply shows the initial shape of the data:
WITH Input AS (
SELECT 1 AS x, 2 AS y, [3, 4, 5] AS z UNION ALL
SELECT 6, 7, [8, 9]
)
SELECT x, y, z
FROM Input;
This example shows the data after "flattening" z
:
WITH Input AS (
SELECT 1 AS x, 2 AS y, [3, 4, 5] AS z UNION ALL
SELECT 6, 7, [8, 9]
)
SELECT x, y, z
FROM Input AS t,
t.z AS z;