I'm using standard SQL on BigQuery to create a new table based on certain conditions within an existing table. I have multiple WHEN clauses to support this (as there are several different conditions I'm checking for). What I now want to do is have multiple THEN clauses within those WHEN statements, as I'm aiming to add more than one column.
Specifically, I want to add a concatenation of two existing text fields as one field, and then an aggregated array of three existing fields as one field:
CASE WHEN
# all three match
one_x1 = two_x1 = three_x1 THEN CONCAT( object1_name, ", ", object2_name, ", ", object3_name ) AND ARRAY_AGG(STRUCT(score_one, score_two, score_three))
# one and two match
WHEN one_x1 = two_x1 THEN CONCAT( object1_name, ", ", object2_name ) AND ARRAY_AGG(STRUCT(score_one, score_two))
# one and three match
WHEN one_x1 = three_x1 THEN CONCAT( object1_name, ", ", object3_name ) AND ARRAY_AGG(STRUCT(score_one, score_three))
# two and three match
WHEN two_x1 = three_x1 THEN CONCAT( object2_name, ", ", object3_name ) AND ARRAY_AGG(STRUCT(score_two, score_three))
ELSE
NULL
END
It is the 'AND ARRAY_AGG(STRUCT(xxxxx))' part that does not work, I have also tried using commas to separate the THEN clauses.
Is the only option to repeat the same case statement to separate the THEN clauses individually?
Sample data: sample_data desired outcome for row 1 of sample data: here
Below is for BigQuery Standard SQL
First, let's correct your initial query so that it actually produces expected result
#standardSQL
SELECT id,
CASE
WHEN one_x1 = two_x1 AND one_x1 = three_x1 THEN CONCAT( object1_name, ", ", object2_name, ", ", object3_name )
WHEN one_x1 = two_x1 THEN CONCAT( object1_name, ", ", object2_name )
WHEN one_x1 = three_x1 THEN CONCAT( object1_name, ", ", object3_name )
WHEN two_x1 = three_x1 THEN CONCAT( object2_name, ", ", object3_name )
ELSE NULL
END AS field1,
CASE
WHEN one_x1 = two_x1 AND one_x1 = three_x1 THEN [score_one, score_two, score_three]
WHEN one_x1 = two_x1 THEN [score_one, score_two]
WHEN one_x1 = three_x1 THEN [score_one, score_three]
WHEN two_x1 = three_x1 THEN [score_two, score_three]
ELSE NULL
END AS field2
FROM `project.dataset.table`
If to apply to sample data from your question - result is
Row id field1 field2
1 1 Dog, Animal 0.82
0.72
2 2 Horse, Animal, Bird 0.76
0.73
0.9
3 3 Dog, Animal, Chicken 0.67
0.75
0.65
4 4 Bird, Chicken 0.87
0.86
Next, as I understand, you want to avoid repeating same conditions again and again in your CASE - for this - you can use below trick
#standardSQL
SELECT id, fields.* FROM (
SELECT id,
CASE
WHEN one_x1 = two_x1 AND one_x1 = three_x1 THEN
STRUCT(CONCAT( object1_name, ", ", object2_name, ", ", object3_name) AS field1, [score_one, score_two, score_three] AS field2)
WHEN one_x1 = two_x1 THEN
STRUCT(CONCAT( object1_name, ", ", object2_name ) AS field1, [score_one, score_two] AS field2)
WHEN one_x1 = three_x1 THEN
STRUCT(CONCAT( object1_name, ", ", object3_name ) AS field1, [score_one, score_three] AS field2)
WHEN two_x1 = three_x1 THEN
STRUCT(CONCAT( object2_name, ", ", object3_name ) AS field1, [score_two, score_three] AS field2)
ELSE NULL
END AS fields
FROM `project.dataset.table`
)
obviously with the same output ...
And, finally, as yet another option for yo - you can eliminate all those case/when/then with below approach
#standardSQL
SELECT id,
(SELECT STRING_AGG(object) FROM UNNEST(objects) object WITH OFFSET
JOIN UNNEST(pos) OFFSET USING(OFFSET)
) field1,
(SELECT ARRAY_AGG(score) FROM UNNEST(scores) score WITH OFFSET
JOIN UNNEST(pos) OFFSET USING(OFFSET)
) field2
FROM (
SELECT id,
[object1_name, object2_name, object3_name] objects,
[score_one, score_two, score_three] scores,
(SELECT ARRAY_AGG(OFFSET)
FROM UNNEST([one_x1, two_x1, three_x1]) x WITH OFFSET
GROUP BY x HAVING COUNT(1) > 1
) pos
FROM `project.dataset.table`
)
again with the same output