database hadoop database-schema cloudera impala

load data into impala partitioned table

I have data in HDFS in following dir structure :

/exported/2014/07/01/00/SEARCHES/part-m-00000.bz2
                                 part-m-00001.bz2
/exported/2014/07/01/02/SEARCHES/part-m-00000.bz2
                                 part-m-00001.bz2
                                 part-m-00003.bz2
.
.
.
.
/exported/2014/08/01/09/SEARCHES/part-m-00005 .bz2

there are multiple part files in each subdirectory.

I want to load this dataset into impala table, so use following query to create table :

CREATE EXTERNAL TABLE search(time_stamp TIMESTAMP, ..... url STRING,domain STRING) PARTITIONED BY (year INT, month INT, day INT. hour INT) row format delimited fields terminated by '\t';

Then

ALTER TABLE search ADD PARTITION (year=2014, month=08, day=01) LOCATION '/data/jobs/exported/2014/08/01/*/SEARCHES/';

But it failed to load with following error :

ERROR: AnalysisException: Failed to load metadata for table: magneticbi.search_mmx
CAUSED BY: TableLoadingException: Failed to load metadata for table: search_mmx
CAUSED BY: RuntimeException: Compressed text files are not supported: part-m-00000.bz2

not sure what is the correct way to do this.

Anyone can help in this ?

Thanks

Solution

Here's a link to a table from Cloudera that describes your options. To summarize:

Impala supports the following compression codecs:

Snappy. Recommended for its effective balance between compression ratio and decompression speed. Snappy compression is very fast, but GZIP provides greater space savings. Not supported for text files.
GZIP. Recommended when achieving the highest level of compression (and therefore greatest disk-space savings) is desired. Not supported for text files. Deflate. Not supported for text files.
BZIP2. Not supported for text files.
LZO, for Text files only. Impala can query LZO-compressed Text tables, but currently cannot create them or insert data into them; perform these operations in Hive.