Search code examples
postgresqlcitus

Does Citus expose the hash function used to prune shards?


This document has a good description how to prepare range-partitioned data for insertion into a target shard. If I knew the exact hash function, I could similarly prepare data for insertion into hash-distributed tables.

Such a function is hinted at here, but I could not find it where I expected it in the source.

Where does Citus determine the hash function to use during shard pruning?


Solution

  • The answer from metdos helps with the underlying problem (slow data migrations), but it looks like you still want a definitive answer to the original question of "Does Citus expose the hash function it uses?"

    The answer to this question is "No, not directly, but it does expose the cached information about each distributed table and you can use that to discover the hash function, which you'd just need to call". What follows is a sketch of how to do that…

    The function DistributedTableCacheEntry takes a table's identifier as its input and returns a struct populated with the hash function which would be used for that table.

    It's a public function, and exposed by the headers installed by Citus, so you should be able to link against it to write a C-level PostgreSQL function to hash a partition value given the table it belongs in. See FastShardPruning for how to use it.

    The signature would probably look like: CREATE FUNCTION citus_hash(distrel regclass, anyelement partitionval) RETURNS integer. Pseudocode:

    1. Call DistributedTableCacheEntry with distrel as argument
    2. Ensure the table is hash-partitioned
    3. Get the hash function from the cache entry
    4. Ensure partitionval is of the expected type
    5. Call the hash function on partitionval and return the result

    See PostgreSQL's own documentation to learn about writing such a function.