Search code examples
hivehiveqldatabase-partitioning

Partitioned by non-first column


I have a table, which created using following hiveQl-script:

CREATE EXTERNAL TABLE Logs 
(
    ip STRING,
    time STRING,
    query STRING,
    pageSize STRING,
    statusCode STRING,
    browser STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
   -- some regexps
)
STORED AS TEXTFILE
LOCATION '/path';

I need to create partitioning by time field. But in all examples I saw, that partitioning creates only by first field or by the sequence of fields starting at first. Also I saw, that if I write the field in PARTITIONED BY section, I mustn't write it in CREATE TABLE section.

I tried to create partitioning by time in several ways but always cought different exceptions.

For example this:

ParseException line 11:20 cannot recognize input near ')' 'ROW' 'FORMAT' in column type

or this:

ParseException line 16:0 missing EOF at 'PARTITIONED' near ')'

and so on.

So, how can I create partitioning by time field in my case?


Solution

  • The partition column in hive is not a real column.It just gives hive the hint where to find the files of specific partition.

    So if you have a file that you want to store the file into different partitions based on one column in this file.There is no aotumatic way to do this,you have to split the input file on your own and load the splited file into different partition.(In case you dont know how to split a file based on column,use awk {print $0>>"filebase."$2;})

    Or you can load your input to an unpartitioned table first.And then use a query to insert these data to another partitioned table.

    I hope this can help.