Search code examples
google-bigqueryconcatenationetlarray-mergesql-merge

BigQuery: Concatenate two arrays and keep distinct values within MERGE statement


I am working on a MERGE process and update an array field with new data but only if the value isn't already found in the array.

target table
+-----+----------+
| id  |  arr_col |
+-----+----------+
| a   |  [1,2,3] |
| b   |    [0]   |
+-----+----------+

source table
+-----+----------+
| id  |  arr_col |
+-----+----------+
| a   | [3,4,5] |
| b   |  [0,0]   |
+-----+----------+

target table post-merge
+-----+-------------+
| id  |  arr_col    |
+-----+-------------+
| a   | [1,2,3,4,5] |
| b   |    [0]      |
+-----+-------------+

I was trying to use SQL on this answer in my MERGE statement

merge into target t
using source
  on target.id = source.id
when matched then
update set target.arr_col = array(
                             select distinct x 
                             from unnest(array_concat(target.arr_col, source.arr_col)) x
                            ) 

but BigQuery shows me the following error: Correlated Subquery is unsupported in UPDATE clause.

Is there any other way to update this array field via MERGE? The target and source tables can be quite large and would run daily. So it's a process I would like to have incremental updates for as opposed to recreating entire table with new data every time.


Solution

  • Below is for BigQuery Standard SQL

    merge into target
    using (
      select id, 
        array(
          select distinct x 
          from unnest(source.arr_col || target.arr_col) as x
          order by x
        ) as arr_col
      from source 
      join target
      using(id)
    ) source
      on target.id = source.id
    when matched then
    update set target.arr_col = source.arr_col;