I am working on a MERGE process and update an array field with new data but only if the value isn't already found in the array.
target table
+-----+----------+
| id | arr_col |
+-----+----------+
| a | [1,2,3] |
| b | [0] |
+-----+----------+
source table
+-----+----------+
| id | arr_col |
+-----+----------+
| a | [3,4,5] |
| b | [0,0] |
+-----+----------+
target table post-merge
+-----+-------------+
| id | arr_col |
+-----+-------------+
| a | [1,2,3,4,5] |
| b | [0] |
+-----+-------------+
I was trying to use SQL on this answer in my MERGE statement
merge into target t
using source
on target.id = source.id
when matched then
update set target.arr_col = array(
select distinct x
from unnest(array_concat(target.arr_col, source.arr_col)) x
)
but BigQuery shows me the following error:
Correlated Subquery is unsupported in UPDATE clause.
Is there any other way to update this array field via MERGE? The target and source tables can be quite large and would run daily. So it's a process I would like to have incremental updates for as opposed to recreating entire table with new data every time.
Below is for BigQuery Standard SQL
merge into target
using (
select id,
array(
select distinct x
from unnest(source.arr_col || target.arr_col) as x
order by x
) as arr_col
from source
join target
using(id)
) source
on target.id = source.id
when matched then
update set target.arr_col = source.arr_col;