Search code examples
csvhivecreate-tablehive-serdehiveddl

csv file to hive table using load data - How to format the date in csv to accept by hive table


I am using load data syntax to load a csv file to a table.The file is same format as hive accepts. But still after load data is issued, Last 2 columns returns null on select.

1750,651,'2013-03-11','2013-03-17'
1751,652,'2013-03-18','2013-03-24'
1752,653,'2013-03-25','2013-03-31'
1753,654,'2013-04-01','2013-04-07'

create table dattable(
DATANUM    INT,  
ENTRYNUM BIGINT, 
START_DATE  DATE,
END_DATE    DATE ) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' ;

 LOAD DATA LOCAL INPATH '/path/dtatable.csv' OVERWRITE INTO TABLE dattable ;

Select returns NULL values for the last 2 cols

Other question was what if the date format is different than YYYY-MM-DD. is it possible to make hive identify the format? (Because right now i am modifying the csv file format to accept by hive)


Solution

  • LasySimpleSerDe (default) does not work with quoted CSV. Use CSVSerDe:

    create table dattable(
    DATANUM    INT,  
    ENTRYNUM BIGINT, 
    START_DATE  DATE,
    END_DATE    DATE ) 
    ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
    WITH SERDEPROPERTIES (
       "separatorChar" = ",",
       "quoteChar"     = "'"
    )  
    STORED AS TEXTFILE;
    

    Also read this: CSVSerDe treats all columns to be of type String

    Define you date columns as string and apply conversion in select.