I have a hive table with the following schema:
CREATE EXTERNAL TABLE db_test.user_arry(
cstid string,
prdctsslctd array<string>,
indvprc array<bigint>,
dscntamt array<bigint>,
prdctsrjctd array<string>)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'/location/on/a/hadoop/'
Data present in it, is in below format:
--------------------------------------------------------
name | prdctsslctd | indvprc | dscntamt | prdctsrjctd
--------------------------------------------------------
cctg65 ["m_jns","cbyht"] ["23","6"] ["1","1"] ["shs","jkt"]
jju89o0 ["top","jeans_wmn"] ["55","45"] [NULL] [NULL]
ju34hd ["laychps","candy","toy"]["3","5","67"]["12","8"]["candy"]
Trying to pull this data into a table having the data type as string
for all the columns
CREATE EXTERNAL TABLE db_test.user_strng(
cstid string,
prdctsslctd string,
indvprc string,
dscntamt string,
prdctsrjctd string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS textfile
LOCATION
'/location/on/a/hadoop/';
Using:
insert into db_test.user_strng select * from db_test.user_arry;
Actual O/P:
--------------------------------------------------------
name | prdctsslctd | indvprc | dscntamt | prdctsrjctd
--------------------------------------------------------
cctg65 m_jnscbyht 236 11 shsjkt
jju89o0 topjeans_wmn 5545 NULL NULL
ju34hd laychpscandytoy 3567 128 candy
Expected O/P:
--------------------------------------------------------
name | prdctsslctd | indvprc | dscntamt | prdctsrjctd
--------------------------------------------------------
cctg65 "m_jns","cbyht" "23","6" "1","1" "shs","jkt"
jju89o0 "top","jeans_wmn" "55","45" NULL NULL
ju34hd "laychps","candy","toy" "3","5","67" "12","8" "candy"
Not getting where the things are going wrong Or, missing out on something?
Update_1
O/P from the table after performing the conversion array to array:
ALTER TABLE user_arry CHANGE indvprc indvprc array<string>;
ALTER TABLE user_arry CHANGE dscntamt dscntamt array<string>;
--------------------------------------------------------
name | prdctsslctd | indvprc | dscntamt | prdctsrjctd
--------------------------------------------------------
cctg65 ["m_jns","cbyht"] ["23","6"] ["1","1"] ["shs","jkt"]
jju89o0 ["top","jeans_wmn"] ["55","45"] [] []
ju34hd ["laychps","candy","toy"]["3","5","67"]["12","8"]["candy"]
Final O/P from the table where all datatypes are string:
--------------------------------------------------------
name | prdctsslctd | indvprc | dscntamt | prdctsrjctd
--------------------------------------------------------
cctg65 m_jns cbyht 23 6 1 1 shs jkt
jju89o0 top jeans_wmn 55 45
ju34hd laychps candy toy 3 5 67 12 8 candy
Still not getting the desired o/p.
Update_2
As suggested, did the change for FIELDS TERMINATED BY ','
to FIELDS TERMINATED BY '\t'
. Getting the data in desired format.
Change all arrays type to array <string>
:
alter table ALTER TABLE user_arry CHANGE indvprc indvprc array<string>;
alter table ALTER TABLE user_arry CHANGE dscntamt dscntamt array<string>;
And according to your data example, array<bigint>
is not proper format for values like "23.45"
. array<string>
should work fine for your data file.
Use concat_ws
to convert arrays to comma-delimited strings:
insert into db_test.user_strng
select name,
concat_ws(',',prdctsslctd) as prdctsslctd,
concat_ws(',',indvprc) as indvprc,
concat_ws(',',dscntamt) as dscntamt,
concat_ws(',',prdctsrjctd) as prdctsrjctd
from db_test.user_arry;