Search code examples
hadoophivehiveqlhive-serde

taking text file into hive database


i trying to run this code for so long time can somebody tell me what is problem in it code :-

CREATE EXTERNAL TABLE samp_log 
(
ip String ,col1 String ,col2 String , date String , time_hour int ,time_min int 
,time_sec int ,zone int , request String , request_con String , resp_code int 
,resp_byte BIGINT , reference String , ext_reference String , col13 String 
,col14 String ,col15 String , col16 String ,col17 String  
) 
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' 
WITH SERDEPROPERTIES ("field.delim"=" ,[,]") 
STORED AS TEXTFILE

error - Driver returned: 1. Errors: OK FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Cannot validate serde: org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe

i also added jar file of hive-contrib..


Solution

  • use RegexSerDe

    https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-ApacheWeblogData

    Here is a POC:

    create external table mytable  
    ( 
        ip      string
       ,dt      string
       ,tm      string
       ,tz      string
    )
        row format serde 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
        with serdeproperties  
        (
            'input.regex' = '^(.*?) - - \\[(.*?):(.*?) (.*?)\\].*$'
        )
        location '/tmp/mytable'
    ;
    

    select * from mytable
    ;
    

    +-----------------+-------------+------------+------------+
    |   mytable.ip    | mytable.dt  | mytable.tm | mytable.tz |
    +-----------------+-------------+------------+------------+
    | 123.123.123.123 | 26/Apr/2000 | 00:23:48   |      -0400 |
    +-----------------+-------------+------------+------------+