How can I insert overwrite directory with json schema?
There is raw hive avro table; (this is actually has many fields)
tb_test--------
name string
kickname string
-----------------
then I want to save query result into some directory in hdfs by jsonserde.
I tried this.
insert overwrite directory '/json/'
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
WITH SERDEPROPERTIES (
"stat_name"="$._col0",
"stat_interval"="$._col1"
)
STORED AS TEXTFILE
select name, nickname
from tb_test limit 100
But written json in /json/ has _colXX field name instead of the origin field name.
{"_col0":"basic_qv"," _col1":"h"}
{"_col0":"basic_qv","_col1 ":"h"}
{"_col0":"basic_qv","_col1 ":"h"}
{"_col0":"basic_qv"," _col1":"h"}
{"_col0":"basic_qv","_col1 ":"h"}
I expected
{"name":"basic_qv","nickname":"h"}
{"name":"basic_qv","nickname":"h"}
{"name":"basic_qv","nickname":"h"}
{"name":"basic_qv","nickname":"h"}
{"name":"basic_qv","nickname":"h"}
What will help this?
Thanks!!
It seems your issue with a workaround (using JsonUDF with named_struct) is described here: https://github.com/rcongiu/Hive-JSON-Serde/issues/151
extract.hql:
add jar /home/myuser/lib/json-udf-1.3.8-SNAPSHOT-jar-with-dependencies.jar;
create temporary function tjson as 'org.openx.data.udf.JsonUDF';
insert overwrite local directory '/json/'
select
tjson(named_struct("name", t.name,"nickname", t.nickname))
from tb_test t
;
Also you can create JsonSerDe-based table with columns defined, insert overwrite
it and use table location instead of directory.