Search code examples
postgresqlcitus

Can I change the distribution method on an existing Citus table?


During a migration from MySQL into a Citus cluster, I used the range distribution method. The migration is complete, but now I'd like to change the distribution method to hash.

Is there a way to change the distribution method from range to hash for an existing table with data already in it?

I came up with the following procedure, but am not sure it's valid:

  1. Update the minvalue and maxvalue columns of the pg_dist_shard table for all shards being changed
  2. Update the shard storage type column of the pg_dist_partition table from r to h
  3. COMMIT;

Solution

  • That is a good question. Currently, Citus does not provide a direct way to change partition type of existing data.

    In range partitioning, records are placed in shards according to their partition column value and shard min/max values. If a record x resides in shard y, then it means y.minvalue <= x.partition_column <= y.maxvalue.

    In hash partitioning, the partition column is hashed and records are routed according to this hashed value. Therefore, min/max values you see in pg_dist_shard are the boundary values for the result of the hash function. In this case y.minvalue <= hash(x.partition_column) <= y.maxvalue.

    Therefore, doing the changes you have mentioned would end up with an incorrect distribution. In order to switch from range partition to hash partition, the data should be re-distributed. To do that, I suggest reloading the data to an empty hash-partitioned table.

    For more information, you can refer to Working with Distributed Tables and Hash Distribution sections of Citus Documentation.