Search code examples
amazon-web-servicesamazon-athenaaws-glue-data-catalog

AWS Athena: Querying by an attributes of a struct with an array


I crawled data using aws glue to import json data from an s3 folder that contains data where the root braces is an array like this:

[{id: '1', name: 'rick'},{id: '2', name: 'morty'}]

This ends up resulting in a schema like this:

array<struct<expand:string,id:string,name:string>>

How do I query by name in Athena?

If I try this:

SELECT * FROM people_s3_buckets WHERE name = "rick";

I get the following error:

SYNTAX_ERROR: Column 'name' cannot be resolved

Perhaps, there is a way to setup the Glue crawler to add just the elements within the array and avoid the nesting alltogether?


Solution

  • In order to query fields of elements within an array, you would need to UNNEST it first. Assuming that structure array<struct<expand:string,id:string,name:string>> corresponds to column members, you would need to do

    SELECT
        *
    FROM
        people_s3_buckets,
        UNNEST(members) as t(member)
    WHERE
        member.name = 'rick'
    

    Note, you need to use single quotes instead fo double quotes.

    Here is the official AWS docs on handling arrays in AWS Athena: Querying Arrays