Let's say I want to create a simple table with 4 columns in Hive and load some pipe-delimited data starting with pipe.
CREATE table TEST_1 (
COL1 string,
COL2 string,
COL3 string,
COL4 string
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|';
Raw Data:
|123|456|Dasani Bottled|5|,
|124|455|Test Bottled |7|
Table getting date like
----------------------------------
COL1 COL2 COL3 COL4
----------------------------------
123 456 Dasani Bottled
123 455 Test Bottled
----------------------------------
COL1 Getting Empty and last column not loaded. I have try to load the csv file using Hadoop put command.
Help me to resolve this issue.
You can solve this by 2 ways.
cut -d "|" -f 2- input_filename > output_filename
Then use this output_filename as your input to the load process.
-d "|"
- this says, use pipe as a delimiter.
-f 2-
- this says, extract everything after first field.
CREATE table TEST_1 (
dummy string,
COL1 string,
COL2 string,
COL3 string,
COL4 string
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|';
And then proceed with the loading data. Then you can ignore this dummy column or store the data into a final table without this column or create a view on top of this to exclude this dummy column.