I have fixed width file having 5 fixed width column. Here is the structure of the file. {col1:3char, col2:35char, col3:3char, col4:11char, col5:4char}
. Here is the sample file
111 SagarKhatavkar 030 9999ABIT
112 VishalKataria 028 9999ABIT
113 GauravSomvanshi 032 9999ABIT
114 SonalKartekiya 029 9999ABIT
So as suggested in other posts I use RegEx. Here is the DDL I created.
CREATE TABLE emp (emplid STRING, name STRING, age String, salary String, dept STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
"input.regex" = "(.{4})(.{35})(.{3})(.{11})(.{4})",
"output.format.string" = "%1$s %2$s %3$s %4$s %5$s"
)
After using load from local to load the data in this table gives all values as NULL.
load data local inpath '/home/test1/emp.txt' into table emp;
Running select on the table
hive> select * from emp;
OK
NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL
Time taken: 0.959 seconds, Fetched: 4 row(s)
Please advise what's wrong with the DDL? I am using 2.4.2.0-258 version of Hive.
I got the issue. There was an issue with first column data. The file supposed to have EXACT 57 bytes. Otherwise it would mark the complete row as NULL.