I am trying to create two arrays of structures that are independent of each other.
I am trying to get these arrays created from another table that is an import of a flat file with different record types depending on the data (Rec0015 - Earnings and Rec0025 - Deductions). Each record type has a slightly different layout, so I import the record data as a string that is "~" delimited. I have simplified my examples below to only show the basic data needed to illustrate my issues.
If I run the queries independently of each other each array is created properly; however, when I combine both I cannot get it to work.
For the first query:
SELECT
flat.EmployeeNumber,
ARRAY_AGG(STRUCT(
flat.RecordNumberEarn,
flat.EmployeeEarningsAmount
))
as EmployeeEarningsDetail
FROM
(SELECT DISTINCT
stage.EmployeeNumber,
Rec0015.RecordNumberEarn,
Rec0015.EmployeeEarningsAmount,
FROM bq_hpm_ppm_dev.EmployeeGrossToNetPayStaging stage
INNER JOIN
(SELECT EmployeeNumber,
SPLIT(VariableData, '~')[SAFE_OFFSET(0)] AS RecordNumberEarn,
SPLIT(VariableData, '~')[SAFE_OFFSET(4)] AS EmployeeEarningsAmount
FROM bq_hpm_ppm_dev.EmployeeGrossToNetPayStaging
WHERE RecordTypeNumber = '0015'
) Rec0015
ON stage.EmployeeNumber = Rec0015.EmployeeNumber
) as flat
GROUP BY
flat.EmployeeNumber
I get the following result (which is correct):
EmployeeNumber EmployeeEarningsDetail
xxxx521 "{
""EmployeeEarningsDetail"": [{
""RecordNumberEarn"": ""0001"",
""EmployeeEarningsAmount"": "" 375.52""
}, {
""RecordNumberEarn"": ""0002"",
""EmployeeEarningsAmount"": "" 387.26""
}]
}" ....
When I run the "full" query:
SELECT
flat2.EmployeeNumber,
ARRAY_AGG(STRUCT(
Rec0025.RecordNumberDed,
Rec0025.EmployeePayDeductionAmount
))
as EmployeeDeductionsDetail
FROM
(SELECT
flat.EmployeeNumber,
ARRAY_AGG(STRUCT(
flat.RecordNumberEarn,
flat.EmployeeEarningsAmount
))
as EmployeeEarningsDetail
FROM
(SELECT DISTINCT
stage.EmployeeNumber,
Rec0015.RecordNumberEarn,
Rec0015.EmployeeEarningsAmount,
FROM bq_hpm_ppm_dev.EmployeeGrossToNetPayStaging stage
INNER JOIN
(SELECT EmployeeNumber,
SPLIT(VariableData, '~')[SAFE_OFFSET(0)] AS RecordNumberEarn,
SPLIT(VariableData, '~')[SAFE_OFFSET(4)] AS EmployeeEarningsAmount
FROM bq_hpm_ppm_dev.EmployeeGrossToNetPayStaging
WHERE RecordTypeNumber = '0015'
) Rec0015
ON stage.EmployeeNumber = Rec0015.EmployeeNumber
) as flat
GROUP BY
flat.EmployeeNumber
) as flat2
INNER JOIN
(SELECT DISTINCT EmployeeNumber,
SPLIT(VariableData, '~')[SAFE_OFFSET(0)] AS RecordNumberDed,
SPLIT(VariableData, '~')[SAFE_OFFSET(5)] AS EmployeePayDeductionAmount
FROM bq_hpm_ppm_dev.EmployeeGrossToNetPayStaging
WHERE RecordTypeNumber = '0025'
) Rec0025
ON flat2.EmployeeNumber = Rec0025.EmployeeNumber
GROUP BY
flat2.EmployeeNumber
I get the following result (which is correct, but doesn't include the first array structure):
EmployeeNumber EmployeeDeductionsDetail
xxxx521 "{
""EmployeeDeductionsDetail"": [{
""RecordNumberDed"": ""0001"",
""EmployeePayDeductionAmount"": "" 50.65""
}, {
""RecordNumberDed"": ""0002"",
""EmployeePayDeductionAmount"": "" 44.15""
}, {
""RecordNumberDed"": ""0003"",
""EmployeePayDeductionAmount"": "" 44.15""
}, {
""RecordNumberDed"": ""0004"",
""EmployeePayDeductionAmount"": "" 10.33""
}, {
""RecordNumberDed"": ""0005"",
""EmployeePayDeductionAmount"": "" 10.33""
}, {
""RecordNumberDed"": ""0006"",
""EmployeePayDeductionAmount"": "" 61.54""
}, {
""RecordNumberDed"": ""0007"",
""EmployeePayDeductionAmount"": "" 13.22""
}, {
""RecordNumberDed"": ""0008"",
""EmployeePayDeductionAmount"": "" 7.84""
}, {
""RecordNumberDed"": ""0009"",
""EmployeePayDeductionAmount"": "" 0.69""
}, {
""RecordNumberDed"": ""0010"",
""EmployeePayDeductionAmount"": "" 5.00""
}]
}" ...
However, in the 2nd query, I really want to do a group by on the array structure EmployeeEarningsDetail, but when I add the array to the select and group by I get the error:
"Grouping by expressions of type ARRAY is not allowed."
I tried adding a TO_JSON_STRING(EmployeeEarningsDetail) in both the Select and the Group by, but I got a column of just the string not as an array as below:
SELECT
flat2.EmployeeNumber,
TO_JSON_STRING(flat2.EmployeeEarningsDetail),
ARRAY_AGG(STRUCT(
Rec0025.RecordNumberDed,
Rec0025.EmployeePayDeductionAmount
))
as EmployeeDeductionsDetail
FROM
(SELECT
flat.EmployeeNumber,
ARRAY_AGG(STRUCT(
flat.RecordNumberEarn,
flat.EmployeeEarningsAmount
))
as EmployeeEarningsDetail
FROM
(SELECT DISTINCT
stage.EmployeeNumber,
Rec0015.RecordNumberEarn,
Rec0015.EmployeeEarningsAmount,
FROM bq_hpm_ppm_dev.EmployeeGrossToNetPayStaging stage
INNER JOIN
(SELECT EmployeeNumber,
SPLIT(VariableData, '~')[SAFE_OFFSET(0)] AS RecordNumberEarn,
SPLIT(VariableData, '~')[SAFE_OFFSET(4)] AS EmployeeEarningsAmount
FROM bq_hpm_ppm_dev.EmployeeGrossToNetPayStaging
WHERE RecordTypeNumber = '0015'
) Rec0015
ON stage.EmployeeNumber = Rec0015.EmployeeNumber
) as flat
GROUP BY
flat.EmployeeNumber
) as flat2
INNER JOIN
(SELECT DISTINCT EmployeeNumber,
SPLIT(VariableData, '~')[SAFE_OFFSET(0)] AS RecordNumberDed,
SPLIT(VariableData, '~')[SAFE_OFFSET(5)] AS EmployeePayDeductionAmount
FROM bq_hpm_ppm_dev.EmployeeGrossToNetPayStaging
WHERE RecordTypeNumber = '0025'
) Rec0025
ON flat2.EmployeeNumber = Rec0025.EmployeeNumber
GROUP BY
flat2.EmployeeNumber,
TO_JSON_STRING(flat2.EmployeeEarningsDetail)
The results were (not correct at all) are below the JSON shows an f0_ string of all the earnings and the first Deduction followed by another row with the rest of the deductions in an array See below:
EmployeeNumber f0_ EmployeeDeductionsDetail
xxxx521 "[{""RecordNumberEarn"":""0001"",""EmployeeEarningsAmount"":"" 375.52""},{""RecordNumberEarn"":""0002"",""EmployeeEarningsAmount"":"" 387.26""}]" "{
""EmployeeDeductionsDetail"": [{
""RecordNumberDed"": ""0001"",
""EmployeePayDeductionAmount"": "" 50.65""
}, {
""RecordNumberDed"": ""0002"",
""EmployeePayDeductionAmount"": "" 44.15""
}, {
""RecordNumberDed"": ""0003"",
""EmployeePayDeductionAmount"": "" 44.15""
}, {
""RecordNumberDed"": ""0004"",
""EmployeePayDeductionAmount"": "" 10.33""
}, {
""RecordNumberDed"": ""0005"",
""EmployeePayDeductionAmount"": "" 10.33""
}, {
""RecordNumberDed"": ""0006"",
""EmployeePayDeductionAmount"": "" 61.54""
}, {
""RecordNumberDed"": ""0007"",
""EmployeePayDeductionAmount"": "" 13.22""
}, {
""RecordNumberDed"": ""0008"",
""EmployeePayDeductionAmount"": "" 7.84""
}, {
""RecordNumberDed"": ""0009"",
""EmployeePayDeductionAmount"": "" 0.69""
}, {
""RecordNumberDed"": ""0010"",
""EmployeePayDeductionAmount"": "" 5.00""
}]
}" ...
I have tried to put both ARRAY_AGGs in the same SELECT my arrays are cartesian products:
The SQL is:
SELECT
flat.EmployeeNumber,
ARRAY_AGG(STRUCT(
flat.RecordNumberEarn,
flat.EmployeeEarningsAmount
))
as EmployeeEarningsDetail,
ARRAY_AGG(STRUCT(
flat.RecordNumberDed,
flat.EmployeePayDeductionAmount
))
as EmployeeDeductionsDetail
FROM
(SELECT DISTINCT
stage.EmployeeNumber,
Rec0015.RecordNumberEarn,
Rec0015.EmployeeEarningsAmount,
Rec0025.RecordNumberDed,
Rec0025.EmployeePayDeductionAmount
FROM bq_hpm_ppm_dev.EmployeeGrossToNetPayStaging stage
INNER JOIN
(SELECT EmployeeNumber,
SPLIT(VariableData, '~')[SAFE_OFFSET(0)] AS RecordNumberEarn,
SPLIT(VariableData, '~')[SAFE_OFFSET(4)] AS EmployeeEarningsAmount
FROM bq_hpm_ppm_dev.EmployeeGrossToNetPayStaging
WHERE RecordTypeNumber = '0015'
) Rec0015
ON stage.EmployeeNumber = Rec0015.EmployeeNumber
INNER JOIN
(SELECT DISTINCT EmployeeNumber,
SPLIT(VariableData, '~')[SAFE_OFFSET(0)] AS RecordNumberDed,
SPLIT(VariableData, '~')[SAFE_OFFSET(5)] AS EmployeePayDeductionAmount
FROM bq_hpm_ppm_dev.EmployeeGrossToNetPayStaging
WHERE RecordTypeNumber = '0025'
) Rec0025
ON stage.EmployeeNumber = Rec0025.EmployeeNumber
) as flat
GROUP BY
flat.EmployeeNumber
and the results are:
EmployeeNumber EmployeeEarningsDetail EmployeeDeductionsDetail
xxxx521 "{
""EmployeeEarningsDetail"": [{
""RecordNumberEarn"": ""0001"",
""EmployeeEarningsAmount"": "" 375.52""
}, {
""RecordNumberEarn"": ""0001"",
""EmployeeEarningsAmount"": "" 375.52""
}, {
""RecordNumberEarn"": ""0001"",
""EmployeeEarningsAmount"": "" 375.52""
}, {
""RecordNumberEarn"": ""0001"",
""EmployeeEarningsAmount"": "" 375.52""
}, {
""RecordNumberEarn"": ""0001"",
""EmployeeEarningsAmount"": "" 375.52""
}, {
""RecordNumberEarn"": ""0001"",
""EmployeeEarningsAmount"": "" 375.52""
}, {
""RecordNumberEarn"": ""0001"",
""EmployeeEarningsAmount"": "" 375.52""
}, {
""RecordNumberEarn"": ""0001"",
""EmployeeEarningsAmount"": "" 375.52""
}, {
""RecordNumberEarn"": ""0001"",
""EmployeeEarningsAmount"": "" 375.52""
}, {
""RecordNumberEarn"": ""0001"",
""EmployeeEarningsAmount"": "" 375.52""
}, {
""RecordNumberEarn"": ""0002"",
""EmployeeEarningsAmount"": "" 387.26""
}, {
""RecordNumberEarn"": ""0002"",
""EmployeeEarningsAmount"": "" 387.26""
}, {
""RecordNumberEarn"": ""0002"",
""EmployeeEarningsAmount"": "" 387.26""
}, {
""RecordNumberEarn"": ""0002"",
""EmployeeEarningsAmount"": "" 387.26""
}, {
""RecordNumberEarn"": ""0002"",
""EmployeeEarningsAmount"": "" 387.26""
}, {
""RecordNumberEarn"": ""0002"",
""EmployeeEarningsAmount"": "" 387.26""
}, {
""RecordNumberEarn"": ""0002"",
""EmployeeEarningsAmount"": "" 387.26""
}, {
""RecordNumberEarn"": ""0002"",
""EmployeeEarningsAmount"": "" 387.26""
}, {
""RecordNumberEarn"": ""0002"",
""EmployeeEarningsAmount"": "" 387.26""
}, {
""RecordNumberEarn"": ""0002"",
""EmployeeEarningsAmount"": "" 387.26""
}]
}" "{
""EmployeeDeductionsDetail"": [{
""RecordNumberDed"": ""0001"",
""EmployeePayDeductionAmount"": "" 50.65""
}, {
""RecordNumberDed"": ""0002"",
""EmployeePayDeductionAmount"": "" 44.15""
}, {
""RecordNumberDed"": ""0003"",
""EmployeePayDeductionAmount"": "" 44.15""
}, {
""RecordNumberDed"": ""0004"",
""EmployeePayDeductionAmount"": "" 10.33""
}, {
""RecordNumberDed"": ""0005"",
""EmployeePayDeductionAmount"": "" 10.33""
}, {
""RecordNumberDed"": ""0006"",
""EmployeePayDeductionAmount"": "" 61.54""
}, {
""RecordNumberDed"": ""0007"",
""EmployeePayDeductionAmount"": "" 13.22""
}, {
""RecordNumberDed"": ""0008"",
""EmployeePayDeductionAmount"": "" 7.84""
}, {
""RecordNumberDed"": ""0009"",
""EmployeePayDeductionAmount"": "" 0.69""
}, {
""RecordNumberDed"": ""0010"",
""EmployeePayDeductionAmount"": "" 5.00""
}, {
""RecordNumberDed"": ""0001"",
""EmployeePayDeductionAmount"": "" 50.65""
}, {
""RecordNumberDed"": ""0002"",
""EmployeePayDeductionAmount"": "" 44.15""
}, {
""RecordNumberDed"": ""0003"",
""EmployeePayDeductionAmount"": "" 44.15""
}, {
""RecordNumberDed"": ""0004"",
""EmployeePayDeductionAmount"": "" 10.33""
}, {
""RecordNumberDed"": ""0005"",
""EmployeePayDeductionAmount"": "" 10.33""
}, {
""RecordNumberDed"": ""0006"",
""EmployeePayDeductionAmount"": "" 61.54""
}, {
""RecordNumberDed"": ""0007"",
""EmployeePayDeductionAmount"": "" 13.22""
}, {
""RecordNumberDed"": ""0008"",
""EmployeePayDeductionAmount"": "" 7.84""
}, {
""RecordNumberDed"": ""0009"",
""EmployeePayDeductionAmount"": "" 0.69""
}, {
""RecordNumberDed"": ""0010"",
""EmployeePayDeductionAmount"": "" 5.00""
}]
}" ...
Any suggestions on how I can "fix" this.
Thanks, David
A workaround for GROUP BY
when the type does not support aggregation, e.g. STRUCT
or GEOGRAPHY
is to turn it to a STRING
key. TO_JSON_STRING
can do it for STRUCT, ST_AsText
can do it for GEOGRAPHY, etc. Note this might not be very performant.
To get the original value, without any modification, use ANY_VALUE
aggregation function - it picks arbitrary value from its inputs, here all the values are presumably the same, so we don't care which one.