Search code examples
hadoophivehdfshqlpartition

Add a new partition in hive external table and update the existing partition to column of the table to non-partition column


I have below existing table emp with partition column as as_of_date(current_date -1).

CREATE EXTERNAL TABLE IF NOT EXISTS emp(
  student_ID INT, 
  name STRING)
  partitioned by (as_of_date date)
  ROW FORMAT DELIMITED
  FIELDS TERMINATED BY ','
  STORED AS TEXTFILE
  LOCATION '/user/emp';

Below are the existing partition path

user/emp/as_of_date=2021-09-02
user/emp/as_of_date=2021-09-03
user/emp/as_of_date=2021-09-04

In emp table, I have to add new partition column as businessdate(current_date) and change partition column (as_of_date) to non-partition column.

Expected output should be as below.

describe table emp;
CREATE EXTERNAL TABLE IF NOT EXISTS emp(
student_ID INT, 
Name STRING, 
as_of_date date)
  partitioned by (businessdate date)
  ROW FORMAT DELIMITED
  FIELDS TERMINATED BY ','
  STORED AS TEXTFILE
  LOCATION '/user/emp';

After update, below will be hdfs path

user/emp/buinessdate=2021-09-03
user/emp/buinessdate=2021-09-04
user/emp/businessdate=2021-09-05

Expected output table:

|student_ID |name |as_of_date | business_date |
|--| --- | --- |----|
|1 | Sta |2021-09-02| 2021-09-03 |
|2 |Danny|2021-09-03| 2021-09-04 |
|3 |Elle |2021-09-04| 2021-09-05 |

Solution

  • Create new table, load data from old table, remove old table, rename new table.

    --1 Create new table emp1

    CREATE EXTERNAL TABLE emp1(
    student_ID INT, 
    Name STRING, 
    as_of_date date)
      partitioned by (businessdate date)
      ROW FORMAT DELIMITED
      FIELDS TERMINATED BY ','
      STORED AS TEXTFILE
      LOCATION '/user/emp1';
    

    --2 Load data into emp1 from the emp with new partition column calculated

    --dynamic partition mode
    SET hive.exec.dynamic.partition=true;
    SET hive.exec.dynamic.partition.mode=nonstrict;
    
    insert overwrite table emp1 partition (business_date)
    select 
         student_ID, 
         Name, 
         as_of_date,
         date_add(as_of_date,1) as business_date
     from emp;
    

    Now you can drop old (make it managed first to drop location as well) and rename new table if necessary.