Search code examples
hivefixed-widthhive-serde

Unable to load hive table with fixed width file using SerDe


I have fixed width file having 5 fixed width column. Here is the structure of the file. {col1:3char, col2:35char, col3:3char, col4:11char, col5:4char}. Here is the sample file

111  SagarKhatavkar                     030       9999ABIT

112  VishalKataria                      028       9999ABIT

113  GauravSomvanshi                    032       9999ABIT

114  SonalKartekiya                     029       9999ABIT

So as suggested in other posts I use RegEx. Here is the DDL I created.

CREATE TABLE emp (emplid STRING, name STRING, age String, salary String, dept STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
     "input.regex" = "(.{4})(.{35})(.{3})(.{11})(.{4})", 
     "output.format.string" = "%1$s %2$s %3$s %4$s %5$s"
     )

After using load from local to load the data in this table gives all values as NULL.

load data local inpath '/home/test1/emp.txt' into table emp;

Running select on the table
hive> select * from emp;

OK

NULL    NULL    NULL    NULL    NULL

NULL    NULL    NULL    NULL    NULL

NULL    NULL    NULL    NULL    NULL

NULL    NULL    NULL    NULL    NULL

Time taken: 0.959 seconds, Fetched: 4 row(s)

Please advise what's wrong with the DDL? I am using 2.4.2.0-258 version of Hive.


Solution

  • I got the issue. There was an issue with first column data. The file supposed to have EXACT 57 bytes. Otherwise it would mark the complete row as NULL.