Search code examples
sqlconditional-statementscase-statementgoogle-bigquery

How to create multiple 'THEN' clauses for a BigQuery standard SQL case statement?


I'm using standard SQL on BigQuery to create a new table based on certain conditions within an existing table. I have multiple WHEN clauses to support this (as there are several different conditions I'm checking for). What I now want to do is have multiple THEN clauses within those WHEN statements, as I'm aiming to add more than one column.

Specifically, I want to add a concatenation of two existing text fields as one field, and then an aggregated array of three existing fields as one field:

CASE WHEN
    # all three match
    one_x1 = two_x1 = three_x1 THEN CONCAT( object1_name, ", ", object2_name, ", ", object3_name ) AND ARRAY_AGG(STRUCT(score_one, score_two, score_three))
    # one and two match
      WHEN one_x1 = two_x1 THEN CONCAT( object1_name, ", ", object2_name ) AND ARRAY_AGG(STRUCT(score_one, score_two))
    # one and three match
      WHEN one_x1 = three_x1 THEN CONCAT( object1_name, ", ", object3_name ) AND ARRAY_AGG(STRUCT(score_one, score_three))
    # two and three match
      WHEN two_x1 = three_x1 THEN CONCAT( object2_name, ", ", object3_name ) AND ARRAY_AGG(STRUCT(score_two, score_three))
   ELSE
    NULL
 END

It is the 'AND ARRAY_AGG(STRUCT(xxxxx))' part that does not work, I have also tried using commas to separate the THEN clauses.

Is the only option to repeat the same case statement to separate the THEN clauses individually?

Sample data: sample_data desired outcome for row 1 of sample data: here


Solution

  • Below is for BigQuery Standard SQL

    First, let's correct your initial query so that it actually produces expected result

    #standardSQL
    SELECT id, 
    CASE 
        WHEN one_x1 = two_x1 AND one_x1 = three_x1 THEN CONCAT( object1_name, ", ", object2_name, ", ", object3_name )
        WHEN one_x1 = two_x1 THEN CONCAT( object1_name, ", ", object2_name )
        WHEN one_x1 = three_x1 THEN CONCAT( object1_name, ", ", object3_name )
        WHEN two_x1 = three_x1 THEN CONCAT( object2_name, ", ", object3_name )
        ELSE NULL
    END AS field1,
    CASE 
        WHEN one_x1 = two_x1 AND one_x1 = three_x1 THEN [score_one, score_two, score_three]
        WHEN one_x1 = two_x1 THEN [score_one, score_two]
        WHEN one_x1 = three_x1 THEN [score_one, score_three]
        WHEN two_x1 = three_x1 THEN [score_two, score_three]
        ELSE NULL
    END AS field2
    FROM `project.dataset.table`
    

    If to apply to sample data from your question - result is

    Row id  field1                  field2   
    1   1   Dog, Animal             0.82     
                                    0.72     
    2   2   Horse, Animal, Bird     0.76     
                                    0.73     
                                    0.9  
    3   3   Dog, Animal, Chicken    0.67     
                                    0.75     
                                    0.65     
    4   4   Bird, Chicken           0.87     
                                    0.86       
    

    Next, as I understand, you want to avoid repeating same conditions again and again in your CASE - for this - you can use below trick

    #standardSQL
    
    SELECT id, fields.* FROM (
      SELECT id, 
      CASE 
          WHEN one_x1 = two_x1 AND one_x1 = three_x1 THEN 
            STRUCT(CONCAT( object1_name, ", ", object2_name, ", ", object3_name) AS field1, [score_one, score_two, score_three] AS field2)
          WHEN one_x1 = two_x1 THEN 
            STRUCT(CONCAT( object1_name, ", ", object2_name ) AS field1, [score_one, score_two] AS field2)
          WHEN one_x1 = three_x1 THEN 
            STRUCT(CONCAT( object1_name, ", ", object3_name ) AS field1, [score_one, score_three] AS field2)
          WHEN two_x1 = three_x1 THEN 
            STRUCT(CONCAT( object2_name, ", ", object3_name ) AS field1, [score_two, score_three] AS field2)
          ELSE NULL
      END AS fields
      FROM `project.dataset.table`
    )
    

    obviously with the same output ...

    And, finally, as yet another option for yo - you can eliminate all those case/when/then with below approach

    #standardSQL
    SELECT id, 
      (SELECT STRING_AGG(object) FROM UNNEST(objects) object WITH OFFSET
        JOIN UNNEST(pos) OFFSET USING(OFFSET)
      ) field1,
        (SELECT ARRAY_AGG(score) FROM UNNEST(scores) score WITH OFFSET
        JOIN UNNEST(pos) OFFSET USING(OFFSET)
      ) field2
    FROM (
      SELECT id, 
        [object1_name, object2_name, object3_name] objects,
        [score_one, score_two, score_three] scores,
        (SELECT ARRAY_AGG(OFFSET) 
          FROM UNNEST([one_x1, two_x1, three_x1]) x WITH OFFSET 
          GROUP BY x HAVING COUNT(1) > 1
        ) pos
      FROM `project.dataset.table`
    )
    

    again with the same output