During a migration from MySQL into a Citus cluster, I used the range
distribution method. The migration is complete, but now I'd like to change the distribution method to hash
.
Is there a way to change the distribution method from range
to hash
for an existing table with data already in it?
I came up with the following procedure, but am not sure it's valid:
minvalue
and maxvalue
columns of the pg_dist_shard
table for all shards being changedpg_dist_partition
table from r
to h
COMMIT;
That is a good question. Currently, Citus does not provide a direct way to change partition type of existing data.
In range partitioning, records are placed in shards according to their partition column value and shard min/max values. If a record x resides in shard y, then it means y.minvalue <= x.partition_column <= y.maxvalue
.
In hash partitioning, the partition column is hashed and records are routed according to this hashed value. Therefore, min/max values you see in pg_dist_shard
are the boundary values for the result of the hash function. In this case y.minvalue <= hash(x.partition_column) <= y.maxvalue
.
Therefore, doing the changes you have mentioned would end up with an incorrect distribution. In order to switch from range partition to hash partition, the data should be re-distributed. To do that, I suggest reloading the data to an empty hash-partitioned table.
For more information, you can refer to Working with Distributed Tables and Hash Distribution sections of Citus Documentation.