Search code examples
hadoophivehdfshive-partitionshiveddl

What happens if I move Hive table data files before moving the table?


I am trying to move the location of a table to a new directory. Let's say the original location is /data/dir. For example, I am trying something like this:

hadoop fs -mkdir /data/dir_bkp
hadoop fs -mv /data/dir/* /data/dir_bkp

I then do hive commands such as:

ALTER TABLE db.mytable RENAME TO db.mytable_bkp;
ALTER TABLE db.mytable_bkp SET LOCATION /data/dir_bkp;

Is it fine to move the directory files before changing the location of the table? After I run these commands, will the table mytable_bkp be populated as it was before?


Solution

  • After you executed mv command, your original table will become empty. because mv removed data files.

    After you renamed table, it is empty, because it's location is empty.

    After you executed ALTER TABLE SET LOCATION - the table is empty because partitions are mounted to old locations (now empty). Sorry for misleading you in this step previously. After rename table, partitions remain as they were before rename. Each partition can normally have it's own location outside table location.

    If table is MANAGED, make it EXTERNAL:

    alter table table_name SET TBLPROPERTIES('EXTERNAL'='TRUE'); 
    

    Now drop table + create table with new location and run MSCK to create partitions:

    MSCK [REPAIR] TABLE tablename;
    

    If you are on Amazon EMR, run ALTER TABLE tablename RECOVER PARTITIONS; instead of MSCK