I have the following data due to an error:
{
"eventType": "something",
"details": {
"userName": "NotSet",
"username": "[email protected]"
},
"createdAt": 3
}
Creating the table works:
CREATE EXTERNAL TABLE tbl (
eventType string,
`createdAt` string,
details string
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION 's3://xx/yy'
However when I query (tried details to string, struct, map, always same) I get the duplicate key error:
HIVE_CURSOR_ERROR: Row is not a valid JSON Object - JSONException: Duplicate key "username"
They are duplicate if you use them as row columns, but not as map keys, or even string, why does it fail? The org.apache.hive.hcatalog.data.JsonSerDe
can skip but I do not like it since %99.5 data is like this. The org.apache.hive.hcatalog.data.JsonSerDe
fails always.
It's now possible to set a parameter in order to cater for keys that only differs by the case. The name is case.insensitive
and should be set to FALSE
Example:
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ("case.insensitive" = "FALSE")
https://docs.aws.amazon.com/athena/latest/ug/json-serde.html#openx-json-serde