I have a fairly compact dataset which is great for everything except it seems using AutoML to train, so I need to unnest the data in order to be able to run the ML Training on it.
My Data is laid out as 4 columns, the first 3 are each an array of 6 elements:
Or in JSON:
{
"vRMSMin":[245.484,245.571,245.477,245.505,245.47,245.557],
"vRMSMax":[246.63,246.583,246.662,246.601,246.599,246.455],
"eReactiveNegativeKwh":[0.0179,0.0086,0.0142,0.0024,0.0188,0.0097],
"timestamp": "1629861271"
}
I have tried using UNNEST and CROSS JOIN for these but the query wont finish likely due to the huge number of output columns I will have and the amount of data, so I was looking at using PIVOT but cant find enough documentation on how to use it to get the desired result of:
vRMSMin_0,vRMSMin_1,vRMSMin_2,vRMSMin_3,vRMSMin_4,vRMSMin_5,vRMSMax_0,vRMSMax_1,vRMSMax_2,vRMSMax_3,vRMSMax_4,vRMSMax_5,....etc
Please note - this is not the same question as creating a PIVOT table (therefore not a duplicate of all of the other answers around for it that I have searched), I am specifically asking how I can use the PIVOT function on an array to generate columns rather than using rows to create other columns.
I got somewhere around here before my brain melted:
SELECT * FROM (
SELECT vRMSMin, vRMSMax
FROM mydata.data
)
PIVOT
(
MIN( (SELECT SUM(x) FROM UNNEST(vRMSMin) x))
FOR vRMSMin[offset()] in (0,1,2,3,4,5)
)
I am mostly struggling as to how to get the column names and the values, could be that I am asking too much of BigQuery and should look to handle this separately in the client but this would mean double imports, would prefer to be able to handle this via ELT if possible.
Use below approach as an example (easy to be extended with as many extra columns as you need)
with `mydata.data` as (
select 1 id, [245.484,245.571,245.477,245.505,245.47,245.557] as vRMSMin,
[246.63,246.583,246.662,246.601,246.599,246.455] vRMSMax union all
select 2, [145.484,145.571,145.477,145.505,145.47,145.557] ,
[146.63,146.583,146.662,146.601,146.599,146.455]
)
select *
from (
select id, a, b, offset
from `mydata.data`,
unnest(vRMSMin) a with offset
join unnest(vRMSMax) b with offset
using (offset)
)
pivot (min(a) as vRMSMin, min(b) as vRMSMax for offset in (0, 1, 2, 3, 4, 5))
with output