Search code examples
sqlhivehiveqlhive-partitionshiveddl

Data Loaded wrongly into Hive Partitioned table after adding a new column using ALTER


I already have a Hive partitioned table. I needed to add a new column to the table, so i used ALTER to add the column like below.

ALTER TABLE TABLE1 ADD COLUMNS(COLUMN6 STRING);

I have my final table load query like this:

INSERT OVERWRITE table Final table  PARTITION(COLUMN4, COLUMN5)
select
stg.Column1,
stg.Column2,
stg.Column3,
stg.Column4(Partition Column),Field Name:Code Sample value - YAHOO.COM
stg.Column5(Partition Column),Field Name:Date Sample Value - 2021-06-25
stg.Column6(New Column)       Field Name:reason     sample value - Adjustment
from (
         select fee.* from (
             select 
               fees.* , 
               ROW_NUMBER() OVER (PARTITION BY fees.Column1 ORDER BY fees.Column3 DESC) as RNK
             from Stage table fee
         ) fee
         where RNK = 1
     ) stg
     left join (
         select Column1 from Final table
         where Column5(date) in (select distinct column5(date) from Stage table)
     ) TGT
     on tgt.Column1(id) = stg.Column1(id) where tgt.column1 is null 
UNION
select 
tgt.column1(id),
tgt.column2,
tgt.column3,
tgt.column4(partiton column),
tgt.column5(partiton column-date),
tgt.column6(New column)
from 
Final Table TGT
      WHERE TGT.Column5(date) in (select distinct column5(date) from Stage table);"

Now when my job ran today, and when i try to query the final table, i get the below error

Invalid partition value 'Adjustment' for DATE partition key: Code=2021-06-25/date=Adjustment

I can figure out something wrong happend around the partition column but unable to figure out what went wrong..Can someone help?


Solution

  • Partition columns should be the last ones in the select. When you add new column it is being added as the last non-partition column, partition columns remain the last ones, they are not stored in the datafiles, only metadata contains information about partitions. All other columns order also matters, it should match table DDL, check it using DESCRIBE FORMATTED table_name.

    INSERT OVERWRITE table Final table  PARTITION(COLUMN4, COLUMN5)
    select
    stg.Column1,
    stg.Column2,
    stg.Column3,
    stg.Column6 (New column) ------------New column
    stg.Column4(Partition Column)  ---partition columns
    stg.Column5(Partition Column)
    ...