Search code examples
jsonhadoophivehive-serde

Hive external table with JSON SerDe fetching all NULL values


My data is stored in HDFS at directory /tmp/kafka/alert in multiple files. Each file contain new-line separated JSON objects like following.

{"alertHistoryId":123456,"entityId":123,"deviceId":"123","alertTypeId":1,"AlertStartDate":"Dec 28, 2016 12:05:48 PM"}
{"alertHistoryId":123456,"entityId":125,"deviceId":"125","alertTypeId":5,"AlertStartDate":"Dec 28, 2016 11:58:48 AM"}

I added hive JSON SerDe jar using below

ADD JAR /usr/local/downloads/hive-serdes-1.0-SNAPSHOT.jar;

I created table with following

CREATE EXTERNAL TABLE IF NOT EXISTS my_alert (
alertHistoryId bigint, entityId bigint, deviceId string, alertTypeId int,  AlertStartDate string
)
ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe'
LOCATION '/tmp/kafka/alert';

table created successfully. But when I fetched data, I got all null values. Anyone got any idea how to resolve this?


Solution

  • You are using old version of JSON Serde. There might be an issue with your JSON Serde and Hadoop Distribution. Please find below link to get new version of Json Serde. Follow the steps from the link to build it according to your Hadoop distribution.

    https://github.com/rcongiu/Hive-JSON-Serde

    Please see below working example.

    hive> add jar /User/User1/json-serde-1.3.8-SNAPSHOT-jar-with-dependencies.jar;
    Added [/User/User1/json-serde-1.3.8-SNAPSHOT-jar-with-dependencies.jar] to class path
    Added resources: [/User/User1/json-serde-1.3.8-SNAPSHOT-jar-with-dependencies.jar]
    hive> use default;
    OK
    Time taken: 0.021 seconds
    hive> CREATE EXTERNAL TABLE IF NOT EXISTS json_poc (
        > alertHistoryId bigint, entityId bigint, deviceId string, alertTypeId int,  AlertStartDate string
        > )
        > ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
        > LOCATION '/User/User1/sandeep_poc/hive_json';
    OK
    Time taken: 0.077 seconds
    hive> select * from json_poc;
    OK
    123456  123     123     1       Dec 28, 2016 12:05:48 PM
    123456  125     125     5       Dec 28, 2016 11:58:48 AM
    Time taken: 0.052 seconds, Fetched: 2 row(s)
    hive>
    

    How to build jar.

    Maven should be installed on your PC then run command like this.

    C:\Users\User1\Downloads\Hive-JSON-Serde-develop\Hive-JSON-Serde-develop>mvn -Phdp23 clean package

    In my case I am using hdp2.3 so I have provided -Phdp23

    Hope it will help if you are willing to use Hive JSON Serde.