PigLatin - insert data into existing partition?

I have a file test_file_1.txt containing:


and file test_file_2.txt containing:


In HCatalog there is a table:

create table stage.partition_pk (value string)
Partitioned by(date string)
stored as orc;

These two scripts work nicely:

Script 1:

LoadFile = LOAD 'test_file_2.txt' using PigStorage(',') AS (date : chararray, wartosc : chararray);
store LoadFile into 'stage.partition_pk' using org.apache.hcatalog.pig.HCatStorer();

Script 2:

LoadFile = LOAD 'test_file_2.txt' using PigStorage(',') 
AS (date : chararray, wartosc : chararray);
store LoadFile into 'stage.partition_pk' using org.apache.hcatalog.pig.HCatStorer();

Table partition_pk contains four partitions - everything is as expected.

But lets say, there is another file containing data that should be inserterd in one of existing partitions. Pig is unable to write into partition that contain data (or I missed something?) How do you manage loading into existing partitions (on not empty nonpartitioned tables)? Do you read partition, union it with new data, delete partition (how?) and insert it as new partition?


  • Coming from HCatalog's site,, it says: " Once a partition is created records cannot be added to it, removed from it, or updated in it.". So, by the nature of HCatalog, you can't add data to an existing partition that already has data in it.

    There are bugs around this that they are working on. Some of the bugs were fixed in Hive 0.13: (Still unresolved) - The bug used to track the other bugs (Resolved in 0.13) - separate table property for mutable (Still unresolved) - Specific to dynamic partitioning (Resolved in 0.13) - Specific to static partitioning (Still unresolved) - Adds DDL support to HCatalog Basically, it looks like if you don't want to use dynamic partitioning, then 0.13 might work for you . You just need to remember to set the appropriate property

    What I've found that works for me is to create another partition key that I call build_num. I then pass the value of this parameter via the command line and set it in the store statement. Like so:

    create table stage.partition_pk (value string) Partitioned by(date string,build_num string) stored as orc;

    STORE LoadFile into 'partition_pk' using org.apache.hcatalog.pig.HCatStorer('build_num=${build_num}';

    Just don't include the build_num partition in your queries. I generally set the build_num to a timestamp when I ran the job;