Search code examples
jsonhadoophivehiveqlhive-serde

Remove/Mapping duplicates key on Hive table?


I have JSON files to be loaded to hive table, but it contains duplicate key that make all the data null or unable to be select queried on Hive.

Those JSON file had something like this :

{"timeSeries":"17051233123","id":"123","timeseries":"17051233123","name":"sample"}

I try to create hive table

CREATE EXTERNAL TABLE table_hive (`id` 
STRING, `name` STRING, `timeseries` STRING,`timeseries2` STRING)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ( "mapping.timeseries2" = "timeSeries") 
LOCATION 'app/jsonfile.json';

how to make it become queryable hive table ?


Solution

  • Works fine with the JSON SerDe that comes with the Hive distribution

    create external table table_hive 
    (
        id          string
       ,name        string   
       ,timeseries  string
    )
    row format serde 'org.apache.hive.hcatalog.data.JsonSerDe'
    stored as textfile
    ;
    

    select * from table_hive
    ;
    

    +-----+--------+-------------+
    | id  |  name  | timeseries  |
    +-----+--------+-------------+
    | 123 | sample | 17051233123 |
    +-----+--------+-------------+