Search code examples
mysqlpostgresqlforeign-data-wrappercitus

Does Citus support creating shards using mysql_fdw?


The Citus documentation for the master_get_table_metadata function states:

part_storage_type: Type of storage used for the table. May be ‘t’ (standard table), ‘f’ (foreign table) or ‘c’ (columnar table).

But I searched the entire documentation and found no examples of how to work with tables distributed using the ‘f’ (foreign table) partition storage type.

I suppose the initial foreign table could be created using:

CREATE FOREIGN TABLE audit (
    id integer NOT NULL,
    ctime timestamp without time zone DEFAULT now() NOT NULL,
    site_id integer NOT NULL,
    client_id integer,
    done_time timestamp without time zone,
    status text DEFAULT 'NEW' NOT NULL,
    file_id character varying(16) DEFAULT ''::character varying NOT NULL
) SERVER mysql_svr OPTIONS (dbname 'constructor', table_name 'audit');

But how do I distribute such a table after creating it? How will the shards be created?

Update

I have found this

FOREIGN (‘f’) — Indicates that shard stores foreign data. (Used by distributed file_fdw tables)

So my question remains: is it possible to use other foreign data wrappers, such as mysql_fdw?


Solution

  • Creating distributed foreign tables has only partial support right now within Citus. Let's take your example:

    CREATE FOREIGN TABLE audit (
        id integer NOT NULL,
        ctime timestamp without time zone DEFAULT now() NOT NULL,
        site_id integer NOT NULL,
        client_id integer,
        done_time timestamp without time zone,
        status text DEFAULT 'NEW' NOT NULL,
        file_id character varying(16) DEFAULT ''::character varying NOT NULL
    ) SERVER mysql_svr
     OPTIONS (dbname 'constructor', table_name 'audit');
    

    You can now distribute this using:

    SELECT * FROM master_create_distributed_table('audit', 'id', 'append');
    

    And create shards using:

    SELECT master_create_worker_shards('audit', <shard_count>);
    

    However, each shard created on the worker node will inherit the same options as the master node. Thus, each shard will point, in this example, to dbname 'constructor', and foreign table 'audit'. There would be limited value in creating such a distribution, since even though Citus will issue parallel queries, they will all again be sent to a single node and table.

    To construct a more useful example, let's say you already have some (let's say 8) sharded MySQL tables, e.g. audit_1, audit_2, ..., audit_8.

    You can construct the same table as above, and create a distributed setup like so:

    SELECT * FROM master_create_distributed_table('audit', 'id', 'append');
    

    And create shards using:

    SELECT master_create_worker_shards('audit', 8);
    

    You would now need to log into each Citus worker node, and update each shard to point to it's relevant MySQL shard.

    e.g: ALTER TABLE audit_100208 OPTIONS (SET table_name 'audit_1');

    If you have tables spread across multiple nodes or databases, you'd need to manually create specific servers for each foreign node on every Citus worker node.

    There are caveats here to be careful of. For one, we marked the distribution as 'append', because we don't know the underlying distribution of the foreign table. If you use hash, you may get wrong partition pruning via Citus. There may be other caveats too, as this isn't a use-case we actively support or have tested. From a historical perspective, we primarily used this as a proof-of-concept to try reading flat-files spread across multiple nodes.

    ** Edit ** Adding responses to the other questions by Eugen. Also, please note, such Q/A is best suited for the mailing list here: https://groups.google.com/forum/#!forum/citus-users

    • By 'partial support', I meant we will push down the foreign table creation, but will not automatically map different foreign table settings to different shards.

    • SQL and PostgreSQL has a wide range of features, and we don't currently support all of them. We are compiling a list of available features, but in the meantime let us know if there is any features you are interested in.

    • We do automatically create shards with storage-type 'f', when you issue master_create_distributed_table.