I have a file in S3 with the following contents:
{"foo-bar": {"name":"Mercury","distanceFromSun":0.39,"orbitalPeriod":0.24,"dayLength":58.65}}{"foo-bar": {"name":"Venus","distanceFromSun":0.72,"orbitalPeriod":0.62,"dayLength":243.02}}
The Glue table definition is:
CREATE EXTERNAL TABLE `planets_ion_2`(
`foo-bar` struct<name:string,distancefromsun:decimal(38,18),orbitalperiod:decimal(38,18),daylength:decimal(38,18)> COMMENT '')
ROW FORMAT SERDE
'com.amazon.ionhiveserde.IonHiveSerDe'
STORED AS INPUTFORMAT
'com.amazon.ionhiveserde.formats.IonInputFormat'
OUTPUTFORMAT
'com.amazon.ionhiveserde.formats.IonOutputFormat'
LOCATION
's3://<some_bucket>/AthenaDataStore/Planets2'
TBLPROPERTIES (
'transient_lastDdlTime'='1740775321')
I get null values for the "name" key when I query Athena like so:
select "foo-bar".name from planets_ion_2
If I remove the dash from the top level key in the S3 json, from "foo-bar" to "foobar", it works fine:
{"foobar": {"name":"Mercury","distanceFromSun":0.39,"orbitalPeriod":0.24,"dayLength":58.65}}{"foobar": {"name":"Venus","distanceFromSun":0.72,"orbitalPeriod":0.62,"dayLength":243.02}}
CREATE EXTERNAL TABLE `planets_ion_2`(
`foobar` struct<name:string,distancefromsun:decimal(38,18),orbitalperiod:decimal(38,18),daylength:decimal(38,18)> COMMENT '')
ROW FORMAT SERDE
'com.amazon.ionhiveserde.IonHiveSerDe'
STORED AS INPUTFORMAT
'com.amazon.ionhiveserde.formats.IonInputFormat'
OUTPUTFORMAT
'com.amazon.ionhiveserde.formats.IonOutputFormat'
LOCATION
's3://<some_bucket>/AthenaDataStore/Planets2'
TBLPROPERTIES (
'transient_lastDdlTime'='1740775321')
So I can't seem to escape the dash. I've tried a lot of things without success:
select `foo-bar`.name from planets_ion_2
select `foo-bar.name` from planets_ion_2
Any time I try to escape with backtick, I get the error Queries of this type are not supported
I'm not sure if this a Athena or Presto issue. But thanks in advance for any help.
You need to add the following after ROW FORMAT SERDE
WITH SERDEPROPERTIES ("ion.foo-bar.path_extractor" = "('foo-bar')")
With the dash in the name it can't find the path to the data. Use the property path_extractor to create an alias for it with single quotes around the key name since it has a dash in it.
So the create table statement would look something like:
CREATE EXTERNAL TABLE `planets_ion_2`(
`foo-bar` struct<name:string,distancefromsun:decimal(38,18),orbitalperiod:decimal(38,18),daylength:decimal(38,18)> COMMENT '')
ROW FORMAT SERDE
'com.amazon.ionhiveserde.IonHiveSerDe'
WITH SERDEPROPERTIES ("ion.foo-bar.path_extractor" = "('foo-bar')")
STORED AS INPUTFORMAT
'com.amazon.ionhiveserde.formats.IonInputFormat'
OUTPUTFORMAT
'com.amazon.ionhiveserde.formats.IonOutputFormat'
LOCATION
's3://<some_bucket>/AthenaDataStore/Planets2'
TBLPROPERTIES (
'transient_lastDdlTime'='1740775321')
You should then be able to query name using
select "foo-bar".name from from planets_ion_2
Side note - Amazon Ion Hive SerDe is used to query Amazon Ion format data, but can be used to query non-Amazon Ion JSON datasets.
See this note in the link
Because Amazon Ion is a superset of JSON, you can use the Amazon Ion Hive SerDe to query non-Amazon Ion JSON datasets. Unlike other JSON SerDe libraries, the Amazon Ion SerDe does not expect each row of data to be on a single line. This feature is useful if you want to query JSON datasets that are in "pretty print" format or otherwise break up the fields in a row with newline characters.
If you explore the other JSON SerDe libraries you would still need to account for the dash in the key name.
As an example using OpenX JSON SerDe you would have to provide the property like
WITH SERDEPROPERTIES ("mapping.foo-bar" = "foo-bar")
to "remap" that path to account for the dash.
CREATE EXTERNAL TABLE `planets_ion_2`(
`foo-bar` struct<name:string,distancefromsun:decimal(38,18),orbitalperiod:decimal(38,18),daylength:decimal(38,18)> COMMENT '')
ROW FORMAT SERDE
'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ("mapping.foo-bar" = "foo-bar")
LOCATION
's3://<some_bucket>/AthenaDataStore/Planets2'