Search code examples
jsonhadoophivehiveql

HIVE insert overwrite directory with json format


How can I insert overwrite directory with json schema?

There is raw hive avro table; (this is actually has many fields)

tb_test--------
name string
kickname string
-----------------

then I want to save query result into some directory in hdfs by jsonserde.

I tried this.

insert overwrite directory '/json/'
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
WITH SERDEPROPERTIES (
 "stat_name"="$._col0",
 "stat_interval"="$._col1"
)
STORED AS TEXTFILE 
select name, nickname
from tb_test limit 100

But written json in /json/ has _colXX field name instead of the origin field name.

{"_col0":"basic_qv"," _col1":"h"}
{"_col0":"basic_qv","_col1 ":"h"}
{"_col0":"basic_qv","_col1 ":"h"}
{"_col0":"basic_qv"," _col1":"h"}
{"_col0":"basic_qv","_col1 ":"h"}

I expected

{"name":"basic_qv","nickname":"h"}
{"name":"basic_qv","nickname":"h"}
{"name":"basic_qv","nickname":"h"}
{"name":"basic_qv","nickname":"h"}
{"name":"basic_qv","nickname":"h"}

What will help this?

Thanks!!


Solution

  • It seems your issue with a workaround (using JsonUDF with named_struct) is described here: https://github.com/rcongiu/Hive-JSON-Serde/issues/151

    extract.hql:
    add jar /home/myuser/lib/json-udf-1.3.8-SNAPSHOT-jar-with-dependencies.jar;
    create temporary function tjson as 'org.openx.data.udf.JsonUDF';
    
    insert overwrite local directory '/json/'
    select
    tjson(named_struct("name", t.name,"nickname", t.nickname))
    from tb_test t
    ;
    

    Also you can create JsonSerDe-based table with columns defined, insert overwrite it and use table location instead of directory.