Search code examples
sqljsonprestoamazon-athena

In Athena how do I query a member of a struct in an array in a struct?


I am trying to figure out how to query where I am checking the value of usage given the following table creation:

CREATE EXTERNAL TABLE IF NOT EXISTS foo.test (
     `id` string,
     `foo` struct< usages:array< struct< usage:string,
     method_id:int,
     start_at:string,
     end_at:string,
     location:array<string> >>> 
) PARTITIONED BY (
         timestamp date 
) 
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
         'serialization.format' = '1' ) LOCATION 's3://foo.bar/' TBLPROPERTIES ('has_encrypted_data'='false');

I would like to have a query like:

SELECT * FROM "foo"."test" WHERE foo.usages.usage is null;

When I do that I get:

SYNTAX_ERROR: line 1:53: Expression "foo"."usages" is not of type ROW

If I make my query where I directly index the array as seen in the following it works.

SELECT * FROM "foo"."test" WHERE foo.usages[1].usage is null;

My overall goal though is to query across all items in the usages array and find any row where at least one item in the usages array has a member usage that is null.


Solution

  • You can achieve this by unnesting the array into rows and then check those for null values. This will result in one row per null-value entry.

    select * from test
    CROSS JOIN UNNEST(foo.usages) AS t(i)
    where i.usage is null
    

    So if you only need the unique set, you must run this through a SELECT DISTINCT.

    select distinct id from test
    CROSS JOIN UNNEST(foo.usages) AS t(i)
    where i.usage is null