Search code examples
sqljsonamazon-athenapyathena

How to access nested arrays and JSON in AWS Athena


I'm trying to process some data from s3 logs in Athena that has a complex type I cannot figure out how to work with.

I have a table with rows such as:

data
____
"[{\"k1\":\"value1\", \"key2\":\"value2\"...}]"

I'd like to treat it as (1) an array to extract the first element, and then that first element as the JSON that it is.

Everything is confused because the data naturally is a string, that contains an array, that contains json and I don't even know where to start


Solution

  • You can use the following combination of JSON commands:

    SELECT 
         JSON_EXTRACT_SCALAR(
            JSON_EXTRACT_SCALAR('"[{\"k1\":\"value1\", \"key2\":\"value2\"...}]"','$'),
           '$[0].k1'
         )
    

    The inner JSON_EXTRACT_SCALAR will return the JSON ARRAY [{"k1":"value1", "key2":"value2"...}] and the outer will return the relevant value value1

    Another similar option is to use CAST(JSON :

    SELECT 
         JSON_EXTRACT_SCALAR(
            CAST(JSON '"[{\"k1\":\"value1\", \"key2\":\"value2\"...}]"' as VARCHAR),
           '$[0].k1'
         )