Search code examples
arraysstructgoogle-bigquerynullunnest

BigQuery: How to filter out nulls from ARRAYS of STRUCT


I am having trouble getting my SQL in BigQuery to work when my query contains nulls in an ARRAY of STRUCT. Here is the error I'm getting:

Array cannot have a null element; error in writing field permissions.p.accts; error in writing field permissions.p; error in writing field permissions

I have seen the Google Documentation for the ARRAY_AGG function which suggests using the IGNORE NULLS option, but as far as I can tell, this only works on arrays of scalar values, not on arrays of objects (ARRAY of STRUCT). I have tried a number of different approaches to using the ARRAY_AGG function, and I can't get it to work in the following scenario. So I'm looking for a solution that will allow my query to work even when there are nulls (see the second example below).

Here is an example query that works fine:

with example as (
  select JSON '{"permissions":{"p":[{"accts":["abc"],"perms":["def"]}]}}' as json_data
)

select STRUCT( 
          ARRAY(SELECT AS STRUCT 
                  JSON_VALUE_ARRAY(permission,'$.accts') as accts,
                  JSON_VALUE_ARRAY(permission,'$.perms') as perms
                FROM UNNEST(
                            JSON_QUERY_ARRAY(example.json_data, '$.permissions.p')
                          ) as permission
                ) as p 
             ) as permissions
from example
;

Now here is an example of the same query with a null value, and it throws the error mentioned above. The only difference in this query is that the "accts":["abc"] has been changed to "accts":[null].

with example as (
  select JSON '{"permissions":{"p":[{"accts":[null],"perms":["def"]}]}}' as json_data
)

select STRUCT( 
          ARRAY(SELECT AS STRUCT 
                  JSON_VALUE_ARRAY(permission,'$.accts') as accts,
                  JSON_VALUE_ARRAY(permission,'$.perms') as perms
                FROM UNNEST(
                            JSON_QUERY_ARRAY(example.json_data, '$.permissions.p')
                          ) as permission
                ) as p 
             ) as permissions
from example
;

Help is much appreciated!


Solution

  • After parsing the JSON and obtaining the array, this array needs to be unnested and filtered to eliminate the null entries. This can be done in a user function UDF or in a subSelect.

    Create temp function JSON_VALUE_ARRAY_no_NULL(ARR any type, str string)
    as 
     (( SELECT X from UNNEST(JSON_VALUE_ARRAY(ARR ,str )) X WHERE X IS NOT NULL ))
    ;
    with example as (
      select JSON '{"permissions":{"p":[{"accts":["abc"],"perms":["def"]}]}}' as json_data
      union all select JSON '{"permissions":{"p":[{"accts":[null],"perms":["def"]}]}}' 
    )
    
    select STRUCT( 
              ARRAY(SELECT AS STRUCT 
                      (( SELECT X from UNNEST(JSON_VALUE_ARRAY(permission ,'$.accts' )) X WHERE X IS NOT NULL )) as accts,
                      JSON_VALUE_ARRAY_no_NULL(permission ,'$.accts' ) as accts3,
                      JSON_VALUE_ARRAY_no_NULL(permission,'$.perms') as perms
                    FROM UNNEST(
                                JSON_QUERY_ARRAY(example.json_data, '$.permissions.p')
                              ) as permission
                    ) as p 
                 ) as permissions
    from example