Search code examples
sqlhadoophivehiveqlhadoop-partitioning

How to overwrite columns value by selecting another columns in partition table in hive


Hi how to overwrite columns value by selecting same partition table in hive.

I have created table by executing below query

CREATE TABLE user (fname string,lname string) partitioned By (day int);

And i insert the data , after inserting data into table . I executed select query it looks like below:

AA  AA  20170201
BB  BB  20170201
CC  CC  20170201
DD  DD  20170202
EE  EE  20170203

As per my requirement, I want to add one more column to my table(user) ,with the help of below query I added.

ALTER TABLE user ADD COLUMNS ( day2 int);

After adding column,my table look like below

AA  AA  NULL    20170201
BB  BB  NULL    20170201
CC  CC  NULL    20170201
DD  DD  NULL    20170202
EE  EE  NULL    20170203

But I want table like.

AA  AA  20170201    20170202
BB  BB  20170201    20170202
CC  CC  20170201    20170202
DD  DD  20170202    20170202
EE  EE  20170203    20170203

So I executed below query

insert overwrite table user partition (day)
select
    fname,
    lname,
    day as day2,
    case 
        when day <= 20170202 then 20170202
        when day > 20170202 then day
    end as day
from user;

then I have executed select query like below

select * from user;

result is :

AA  AA  NULL    20170201
BB  BB  NULL    20170201
CC  CC  NULL    20170201
AA  AA  NULL    20170202
BB  BB  NULL    20170202
CC  CC  NULL    20170202
DD  DD  NULL    20170202
EE  EE  NULL    20170203

why i am getting null values.Can please let me know anything I missed out, let how to achieve this.i.e

   AA   AA  20170201    20170202
    BB  BB  20170201    20170202
    CC  CC  20170201    20170202
    DD  DD  20170202    20170202
    EE  EE  20170203    20170203

Solution

  • Following the behaviour of partitioned Hive Table, HiVE's schema definitions are maintained at partition level.So older partition may not have received schema updates.

    Detailed Query

    hive> drop table test_insert;
    OK
    hive> CREATE TABLE test_insert (fname string,lname string) partitioned By (day int);
    OK
    hive> insert into test_insert partition (day=20170202) values('f','l');
    OK
    hive> insert into test_insert partition (day=20170203) values('f','l');
    OK
    hive> select * from test_insert;
    OK
    f   l   20170202
    f   l   20170203
    Time taken: 0.148 seconds, Fetched: 2 row(s)
    hive> ALTER TABLE test_insert ADD COLUMNS ( day2 int);
    OK
    Time taken: 0.193 seconds
    hive> select * from test_insert;
    OK
       f    l   NULL    20170202
       f    l   NULL    20170203
    

    At this point, notice the point that schema for partitioned table is maintained at partition level and it's expected honour the partition schema and not table schema.

    Notice the difference down in the columns here. ALTER did not change the column definitions for existing partitions.

      hive> describe formatted test_insert;
        OK
        # col_name              data_type               comment             
    
        fname                   string                                      
        lname                   string                                      
        day2                    int                                         
    
        # Partition Information      
        # col_name              data_type               comment             
    
        day                     int    
    
        hive> describe formatted test_insert partition (day=20170202);
        OK
        # col_name              data_type               comment             
    
        fname                   string                                      
        lname                   string                                      
    
        # Partition Information      
        # col_name              data_type               comment             
    
        day                     int                           
    

    Have tweaked your above query, to create NEW partition, so that schema will be copied over from table to newly created partition.

     hive> insert overwrite table test_insert partition (day)
            > select
            >     fname,
            >     lname,
            >     day as day2,
            >     case 
            >         when day <= 20170202 then 19999999
            >         when day > 20170202 then day
            >     end as day
            > from test_insert;
    
    
        hive> select * from test_insert;
        OK
        f   l   20170202    19999999
        f   l   NULL    20170202
        f   l   NULL    20170203
        Time taken: 0.224 seconds, Fetched: 3 row(s)
    

    Hope this is clear.! If you want to change the schema for all existing partitions try below command and then Try insert.

    ALTER TABLE test_insert  partition(day) ADD COLUMNS ( day2 string);