Search code examples
sqlarrayshivehiveqlcreate-table

HIve: Data format changing while converting from ORC to TEXT


I have a hive table with the following schema:

CREATE EXTERNAL TABLE db_test.user_arry(
  cstid string, 
  prdctsslctd array<string>, 
  indvprc array<bigint>, 
  dscntamt array<bigint>, 
  prdctsrjctd array<string>)
ROW FORMAT DELIMITED 
  FIELDS TERMINATED BY ',' 
  LINES TERMINATED BY '\n'
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
   '/location/on/a/hadoop/'

Data present in it, is in below format:

--------------------------------------------------------  
 name | prdctsslctd | indvprc | dscntamt | prdctsrjctd 
--------------------------------------------------------   
 cctg65  ["m_jns","cbyht"]        ["23","6"]       ["1","1"] ["shs","jkt"]
 jju89o0 ["top","jeans_wmn"]      ["55","45"]      [NULL]         [NULL]
 ju34hd  ["laychps","candy","toy"]["3","5","67"]["12","8"]["candy"]

Trying to pull this data into a table having the data type as string for all the columns

CREATE EXTERNAL TABLE db_test.user_strng(
  cstid string, 
  prdctsslctd string, 
  indvprc string, 
  dscntamt string, 
  prdctsrjctd string)
ROW FORMAT DELIMITED 
 FIELDS TERMINATED BY ',' 
 LINES TERMINATED BY '\n'
STORED AS textfile
LOCATION
 '/location/on/a/hadoop/';

Using:

insert into db_test.user_strng select * from db_test.user_arry;

Actual O/P:

--------------------------------------------------------  
 name | prdctsslctd | indvprc | dscntamt | prdctsrjctd 
--------------------------------------------------------   
 cctg65  m_jnscbyht        236       11     shsjkt
 jju89o0 topjeans_wmn      5545      NULL   NULL
 ju34hd  laychpscandytoy   3567      128    candy

Expected O/P:

--------------------------------------------------------  
 name | prdctsslctd | indvprc | dscntamt | prdctsrjctd 
--------------------------------------------------------   
 cctg65  "m_jns","cbyht"          "23","6"         "1","1"   "shs","jkt"
 jju89o0 "top","jeans_wmn"        "55","45"         NULL           NULL
 ju34hd  "laychps","candy","toy"  "3","5","67"     "12","8"  "candy"

Not getting where the things are going wrong Or, missing out on something?

Update_1

O/P from the table after performing the conversion array to array:

ALTER TABLE user_arry CHANGE indvprc indvprc array<string>;
ALTER TABLE user_arry CHANGE dscntamt dscntamt array<string>;


--------------------------------------------------------  
 name | prdctsslctd | indvprc | dscntamt | prdctsrjctd 
--------------------------------------------------------   
 cctg65  ["m_jns","cbyht"]        ["23","6"]       ["1","1"] ["shs","jkt"]
 jju89o0 ["top","jeans_wmn"]      ["55","45"]      []         []
 ju34hd  ["laychps","candy","toy"]["3","5","67"]["12","8"]["candy"]

Final O/P from the table where all datatypes are string:

--------------------------------------------------------  
 name | prdctsslctd | indvprc | dscntamt | prdctsrjctd 
--------------------------------------------------------   
 cctg65  m_jns    cbyht      23   6    1      1      shs  jkt
 jju89o0 top      jeans_wmn  55   45       
 ju34hd  laychps  candy      toy  3    5      67     12    8    candy

Still not getting the desired o/p.

Update_2

As suggested, did the change for FIELDS TERMINATED BY ',' to FIELDS TERMINATED BY '\t' . Getting the data in desired format.


Solution

  • Change all arrays type to array <string>:

    alter table ALTER TABLE user_arry CHANGE indvprc indvprc array<string>;
    alter table ALTER TABLE user_arry CHANGE dscntamt dscntamt array<string>;
    

    And according to your data example, array<bigint> is not proper format for values like "23.45". array<string> should work fine for your data file.

    Use concat_ws to convert arrays to comma-delimited strings:

    insert into db_test.user_strng 
    select  name, 
            concat_ws(',',prdctsslctd)  as prdctsslctd,
            concat_ws(',',indvprc)      as indvprc,
            concat_ws(',',dscntamt)     as dscntamt,
            concat_ws(',',prdctsrjctd) as prdctsrjctd 
      from db_test.user_arry;