Search code examples
jsonformatamazon-athenajson-deserializationtrino

Athena (Trino SQL) parsing JSON document using fields (dot notation)


Athena (Trino SQL) parsing JSON document (table column called document 1 in Athena) using fields (dot notation)

If the underlying json (table column called document 1 in Athena) is in the form of {a={b ... I can parse it in Athena (Trino SQL) using

document1.a.b

However, if the JSON contains {a={"text": value1 ...

the quote marks will not parse correctly.

Is there a way to do JSON parsing of a 'field' with quotes?

If not, is there an elegant way of parsing the "text" and obtain the string in value 1? [Please see my comment below].

I cannot change the quotes in the json and its Athena "table" so I would need something that works in Trino SQL syntax.

The error message is in the form of: SQL Error [100071] [HY000]: [Simba][AthenaJDBC](100071) An error has been thrown from the AWS Athena client. SYNTAX_ERROR: Expression [redacted] is not of type ROW

NOTE: This is not a duplicate of Oracle Dot Notation Question


Solution

  • Dot notation works only for columns types as struct<…>. You can do that for JSON data, but judging from the error and your description this seems not to be the case. I assume your column is of type string.

    If you have JSON data in a string column you can use JSON functions to parse and extract parts of them with JSONPath.