Search code examples
performancedatabase-designmulti-tenantcockroachdb

How to structure a multi-tenant system in CockroachDB to ensure good performance?


We have a multi-tenant system whose data we're starting to migrate to CockroachDB. We will often need to join data belonging to a single tenant from across multiple tables. In more seldom cases, we will also read data belonging multiple/all tenants - but that doesn't happen as often, and thus is not as important seen from a performance perspective.

We thought table interleaving might have been part of a good solution - i.e.: have a tenant "parent" table, and then interleave other tables into that table, to increase chance data belonging to same tenant end up in same range. But whether that was a good idea or not, doesn't matter now, as interleaving has been deprecated.

Creating a separate schema or database per tenant, might be one solution. But as we already have one schema per microservice, that would lead to an explosion of schemas (we have thousands of tenants). Thus some design with a "discriminator" column telling which tenant a given row of data belongs to is preferable.

What would be a good design, that is both nice to work with and has good performance? E.g. would it help to use composite keys with the tenant-ID as the first part of the key? I guess that might at least ensure that data from the same tenant in a single table is located next to each other, and thus more likely to end up in the same range? That won't help when doing joins across tables though (which is what I had expected interleaving to help with).


Solution

  • Creating a separate schema or database per tenant, might be one solution. But as we already have one schema per microservice, that would lead to an explosion of schemas (we have thousands of tenants). Thus some design with a "discriminator" column telling which tenant a given row of data belongs to is preferable.

    Either of these approaches will be appropriate in terms of performance. In fact, they'll result in very similar key encodings under the hood in CRDB's distributed key-value store.

    That said, CockroachDB has a soft-limit of around 10k databases, so if you already have thousands of users, a database-per-user may be inadvisable.

    So including a tenant-id as the prefix of each table's primary key and secondary indexes is likely the best approach. It will ensure that data from the same tenant in a single table is located next to each other for the purposes of optimizing scans and multi-row read-write transactions.