Search code examples
csvhadoophivedelimiterhiveddl

Hive - Load pipe delimited data starting with pipe


Let's say I want to create a simple table with 4 columns in Hive and load some pipe-delimited data starting with pipe.

CREATE table TEST_1 (
COL1  string,
COL2  string,
COL3  string,
COL4  string
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|';

Raw Data:

|123|456|Dasani Bottled|5|,   
|124|455|Test Bottled |7|      

Table getting date like

----------------------------------
COL1  COL2   COL3     COL4
----------------------------------
      123   456     Dasani Bottled
      123   455     Test Bottled
----------------------------------

COL1 Getting Empty and last column not loaded. I have try to load the csv file using Hadoop put command.

Help me to resolve this issue.


Solution

  • You can solve this by 2 ways.

    1. Remove first column before processing the file. This is clean and preferable solution.
    cut -d "|" -f 2- input_filename > output_filename
    

    Then use this output_filename as your input to the load process.

    -d "|" - this says, use pipe as a delimiter. -f 2- - this says, extract everything after first field.

    1. add a dummy column in the beginning of the table like this
    CREATE table TEST_1 (
    dummy string,
    COL1  string,
    COL2  string,
    COL3  string,
    COL4  string
    )
    ROW FORMAT DELIMITED
    FIELDS TERMINATED BY '|';
    

    And then proceed with the loading data. Then you can ignore this dummy column or store the data into a final table without this column or create a view on top of this to exclude this dummy column.