Hi how to overwrite columns value by selecting same partition table in hive.
I have created table by executing below query
CREATE TABLE user (fname string,lname string) partitioned By (day int);
And i insert the data , after inserting data into table . I executed select query it looks like below:
AA AA 20170201
BB BB 20170201
CC CC 20170201
DD DD 20170202
EE EE 20170203
As per my requirement, I want to add one more column to my table(user) ,with the help of below query I added.
ALTER TABLE user ADD COLUMNS ( day2 int);
After adding column,my table look like below
AA AA NULL 20170201
BB BB NULL 20170201
CC CC NULL 20170201
DD DD NULL 20170202
EE EE NULL 20170203
But I want table like.
AA AA 20170201 20170202
BB BB 20170201 20170202
CC CC 20170201 20170202
DD DD 20170202 20170202
EE EE 20170203 20170203
So I executed below query
insert overwrite table user partition (day)
select
fname,
lname,
day as day2,
case
when day <= 20170202 then 20170202
when day > 20170202 then day
end as day
from user;
then I have executed select query like below
select * from user;
result is :
AA AA NULL 20170201
BB BB NULL 20170201
CC CC NULL 20170201
AA AA NULL 20170202
BB BB NULL 20170202
CC CC NULL 20170202
DD DD NULL 20170202
EE EE NULL 20170203
why i am getting null values.Can please let me know anything I missed out, let how to achieve this.i.e
AA AA 20170201 20170202
BB BB 20170201 20170202
CC CC 20170201 20170202
DD DD 20170202 20170202
EE EE 20170203 20170203
Following the behaviour of partitioned Hive Table, HiVE's schema definitions are maintained at partition level.So older partition may not have received schema updates.
Detailed Query
hive> drop table test_insert;
OK
hive> CREATE TABLE test_insert (fname string,lname string) partitioned By (day int);
OK
hive> insert into test_insert partition (day=20170202) values('f','l');
OK
hive> insert into test_insert partition (day=20170203) values('f','l');
OK
hive> select * from test_insert;
OK
f l 20170202
f l 20170203
Time taken: 0.148 seconds, Fetched: 2 row(s)
hive> ALTER TABLE test_insert ADD COLUMNS ( day2 int);
OK
Time taken: 0.193 seconds
hive> select * from test_insert;
OK
f l NULL 20170202
f l NULL 20170203
At this point, notice the point that schema for partitioned table is maintained at partition level and it's expected honour the partition schema and not table schema.
Notice the difference down in the columns here. ALTER did not change the column definitions for existing partitions.
hive> describe formatted test_insert;
OK
# col_name data_type comment
fname string
lname string
day2 int
# Partition Information
# col_name data_type comment
day int
hive> describe formatted test_insert partition (day=20170202);
OK
# col_name data_type comment
fname string
lname string
# Partition Information
# col_name data_type comment
day int
Have tweaked your above query, to create NEW partition, so that schema will be copied over from table to newly created partition.
hive> insert overwrite table test_insert partition (day)
> select
> fname,
> lname,
> day as day2,
> case
> when day <= 20170202 then 19999999
> when day > 20170202 then day
> end as day
> from test_insert;
hive> select * from test_insert;
OK
f l 20170202 19999999
f l NULL 20170202
f l NULL 20170203
Time taken: 0.224 seconds, Fetched: 3 row(s)
Hope this is clear.! If you want to change the schema for all existing partitions try below command and then Try insert.
ALTER TABLE test_insert partition(day) ADD COLUMNS ( day2 string);