I crawled data using aws glue to import json data from an s3 folder that contains data where the root braces is an array like this:
[{id: '1', name: 'rick'},{id: '2', name: 'morty'}]
This ends up resulting in a schema like this:
array<struct<expand:string,id:string,name:string>>
How do I query by name
in Athena?
If I try this:
SELECT * FROM people_s3_buckets WHERE name = "rick";
I get the following error:
SYNTAX_ERROR: Column 'name' cannot be resolved
Perhaps, there is a way to setup the Glue crawler to add just the elements within the array and avoid the nesting alltogether?
In order to query fields of elements within an array, you would need to UNNEST
it first. Assuming that structure array<struct<expand:string,id:string,name:string>>
corresponds to column members
, you would need to do
SELECT
*
FROM
people_s3_buckets,
UNNEST(members) as t(member)
WHERE
member.name = 'rick'
Note, you need to use single quotes instead fo double quotes.
Here is the official AWS docs on handling arrays in AWS Athena: Querying Arrays