Search code examples
sqljsonprestoamazon-athenatrino

How can use WHERE clause in AWS Athena Json queries?


I have a table where I've stored some information from a Json object:

Table:

investment
    unit(string)
    data(string)

If a run the the query SELECT * FROM "db"."investment" limit 10; I got the following result:

Unit Data
CH  [{"from":"CH","when":"2021-02-16","who":"pp@gmail.com"}]
AB  [{"from":"AB","when":"2020-02-16","who":"jj@gmail.com"}]

Now, I run the following basic query to return value within the Json nested object:

SELECT json_extract_scalar(Data, '$[0].who') email FROM "db"."investment";

and I got the following result:

email
jj@gmail.com
pp@gmail.com

How can filter this query with WHERE clause to return just a single value:

I've tried this, but obviously it doesn't work as normal SQL table with row and columns:

SELECT json_extract_scalar(Data, '$[0].who') email FROM "db"."investment" WHERE email = "pp@gmail.com";

Any help with this?


Solution

  • your question seems to have a few typos.

    • Date in Unit Date should probably be Data
    • what is key referring to. Perhaps you mean Data

    also, note that athena is case insensitive, and column names are converted to lower case (even if you quote them).

    with that out of the way, you have to use the full expression that extracts your email from the json document in the where clause. the column alias defined is not accessible to the rest of the query.

    here's a self contained example:

    with test (unit, data) as (
    values
    ('CH',  JSON '[{"from":"CH","when":"2021-02-16","who":"pp@gmail.com"}]'),
    ('AB',  JSON '[{"from":"AB","when":"2020-02-16","who":"jj@gmail.com"}]')
    )
    select json_extract_scalar(data, '$[0].who') email
    from test
    where json_extract_scalar(data, '$[0].who') = 'pp@gmail.com';
    
    outputs: 
    | email        |
    +--------------+
    | pp@gmail.com |