I have two nested tables one is source and another is target table. I wanted to compare nested columns of source and target table. I am comparing two tables to check weather data is being updated in source table or not. Is there any sql in BigQuery to achieve the same?
Here are my approaches which I was previously doing to compare two tables with nested record:
1. This is the first approach:
SELECT to_json_string(info) FROM database.nested_table_source
except distinct
SELECT to_json_string(info) FROM nested_table_target
to_json_string() is not working, since this function returns different sequence of source rows and target rows sometimes, even though data is same in both tables, it results out different records.
2. This is the second approach:
select name
from dataset.nested_table_source a
join dataset.nested_table_target b
using(name)
where
a.name!=b.name and
(select string_agg(format('%t', s) order by key) from a.info s)
!= (select string_agg(format('%t', s) order by key ) from b.info s)
In this approach I am using string_agg function to compare two nested records. But I am not sure if that is the correct way to compare record fields.
What should I do in this case?
Here is an approach in which you basically stringify an ordered set of objects (or the info
column in your tables), and then compare them against each other.
Here is an example with some dummy data:
with source_data as (
select
"VICTOR" as name,
array[
struct("A" as key, 3 as value),
struct("B" as key, 5 as value)
] as info
union all
select
"MAX" as name,
array[
struct("A" as key, 0 as value),
struct("B" as key, 1 as value)
] as info
union all
select
"SAIF" as name,
array[
struct("A" as key, 0 as value),
struct("B" as key, 1 as value)
] as info
),
target_data as (
select
"VICTOR" as name,
array[
struct("A" as key, 3 as value),
struct("B" as key, 15 as value)
] as info
union all
select
"MAX" as name,
array[
struct("A" as key, 0 as value),
struct("B" as key, 1 as value)
] as info
)
select name, stringified_source_set as info from (
select
s.name,
array_to_string(array(select concat(cast(x.key as string), '|', cast(x.value as string)) from unnest(t.info) as x order by x.key), '|') AS stringified_target_set,
array_to_string(array(select concat(cast(x.key as string), '|', cast(x.value as string)) from unnest(s.info) as x order by x.key), '|') AS stringified_source_set
from source_data s
left join target_data t on t.name = s.name
)
where (stringified_source_set != stringified_target_set) or (name is null)
Note that the above approach does achieve both "horizontal comparison" (i.e. comparing both info
objects) and "vertical comparison" (i.e. comparing missing entries in the target table which exist in the source table).