Search code examples
hiveamazon-athenapresto

Duplicate keys with Amazon Athena and Open JSONx Serde


I have the following data due to an error:

{
  "eventType": "something",
  "details": {
    "userName": "NotSet",
    "username": "[email protected]"
  },
  "createdAt": 3
}

Creating the table works:

CREATE EXTERNAL TABLE tbl (
         eventType string,
         `createdAt` string,
         details string 
) 
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
 LOCATION 's3://xx/yy'

However when I query (tried details to string, struct, map, always same) I get the duplicate key error:

HIVE_CURSOR_ERROR: Row is not a valid JSON Object - JSONException: Duplicate key "username"

They are duplicate if you use them as row columns, but not as map keys, or even string, why does it fail? The org.apache.hive.hcatalog.data.JsonSerDe can skip but I do not like it since %99.5 data is like this. The org.apache.hive.hcatalog.data.JsonSerDe fails always.


Solution

  • It's now possible to set a parameter in order to cater for keys that only differs by the case. The name is case.insensitive and should be set to FALSE

    Example:

    ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
    WITH SERDEPROPERTIES ("case.insensitive" = "FALSE")
    

    https://docs.aws.amazon.com/athena/latest/ug/json-serde.html#openx-json-serde