MemSQL - performance implications of large numbers of databases

I am looking for some guidance around any implications of creating large numbers of databases within MemSQL.

My use case requires that each user within my application has access to n tables. One option is to create a DB per user (so that listing tables etc is simple with "SHOW TABLES", also its good layer of abstraction for admin and security). Option 2 is to have a single DB containing all users and tables but that adds complexity to the controlling logic.

Questions:

What are the performance implications (if any) of having a many DB approach rather than a single DB?

Appreciate any guidance.

Solution

Each database in memsql has it's own transaction buffer, with size set by the transaction-buffer variable in memsql.cnf. For single box the default size is 128m, i.e. each new database will consume an additional 128 megabytes of RAM. This puts a hard limit on the number of databases you can create: how much RAM you have.

In distributed memsql, both the aggregator and leaf nodes use 64m per database by default. But, the leaves have one database per partition, and each leaf typically has one partition per CPU core. This means each database created on the memsql cluster will use CORES * 64m by default on each leaf, e.g. 512m per database on an 8 core leaf node.

If you want to create a huge number of databases, you will have to lower the size of the transaction buffer. That buffer is used for writes, so unless your write workload is highly concurrent or creates huge transactions, it's safe to lower that default value.

For your use case, if you expect only a small fraction of your users to be writing to their database at any given time, a lot of that transaction buffer space will be idle if they each have their own database. Putting all users in one database will let them share the transaction buffer, reducing wasted memory.