I am trying to figure out how to query where I am checking the value of usage
given the following table creation:
CREATE EXTERNAL TABLE IF NOT EXISTS foo.test (
`id` string,
`foo` struct< usages:array< struct< usage:string,
method_id:int,
start_at:string,
end_at:string,
location:array<string> >>>
) PARTITIONED BY (
timestamp date
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = '1' ) LOCATION 's3://foo.bar/' TBLPROPERTIES ('has_encrypted_data'='false');
I would like to have a query like:
SELECT * FROM "foo"."test" WHERE foo.usages.usage is null;
When I do that I get:
SYNTAX_ERROR: line 1:53: Expression "foo"."usages" is not of type ROW
If I make my query where I directly index the array as seen in the following it works.
SELECT * FROM "foo"."test" WHERE foo.usages[1].usage is null;
My overall goal though is to query across all items in the usages
array and find any row where at least one item in the usages
array has a member usage
that is null.
You can achieve this by unnesting the array into rows and then check those for null
values. This will result in one row per null
-value entry.
select * from test
CROSS JOIN UNNEST(foo.usages) AS t(i)
where i.usage is null
So if you only need the unique set, you must run this through a SELECT DISTINCT
.
select distinct id from test
CROSS JOIN UNNEST(foo.usages) AS t(i)
where i.usage is null