Search code examples
amazon-s3prestotrinojson-extract

Query JSON file in Presto in S3


I have a file in S3, and Presto running on EMR. I see I can use Json_extract to read the json.

I am running the following query, however, I keep seeing null instead of the correct value.

select json_extract('s3a://random-s3-bucket/analytics/20210221/myjsonfile.json', '$.dateAvailability')

I see this output

enter image description here

Not sure if my syntax is wrong? Thoughts?


Solution

  • json_extract() operates on JSON scalar values kept in memory. It does not load data from an external location. See documentation page for usage examples.

    In order to query a JSON file using Trino (formerly known as Presto SQL), you need to map it as a table with JSON format like this:

    CREATE TABLE my_table ( .... )
    WITH (
        format = 'JSON',
        external_location = 's3a://random-s3-bucket/analytics/20210221'
    );
    

    See more information in Hive connector documentation.