Search code examples
apache-sparkhadoophivehdfshiveql

Hive Increase in bucket count- how the existing data split occurs to new buckets?


In hive,I have Orc file formatted table with 10 buckets and the table has 1Tb of data already.If i increase the bucket count,will my existing data split occurs between new buckets automatically or do I need to reload the data in table . Is there any way to alter the bucket count? I am newbie to bucketing concepts.can someone help on answer this question?


Solution

  • If you use ALTER TABLE mytable CLUSTERED BY (my_field) INTO 10 BUCKETS, existing data will not be redistributed.
    And new rows will be bucketed/redistributed into new buckets.

    If you want a clean method, please follow -

    1. Create a new table with new structure.
    2. Insert data into the new table from old table.
    3. Drop old table.

    This will redistribute whole data into new buckets.