Search code examples
mysqlmulti-tenantsaasshardingentity-attribute-value

MySQL User Defined Values - EAV vs Sharding with Many tables


I am currently in development of a multi-tenant system that as core functionality of the system allows the user to define custom types. So for example, they would define an Event, Account, Order, Shipment whatever they choose. Every user in the system will have different definitions for what they want to manage in terms of fields. So for one user an Order may have an order number, status and due date where as for another user it might have 10 fields.

The developers I am working with want to use EAV to store this data. I am opposed to this idea. I have read many articles on this site as well as all over the internet listing the disadvantages of this anti-design pattern, but none mentioning the approach i'm thinking of taking. I am trying to build this application such that it is scalable from the beginning.

When I do the math, if I have 1000 tenants, with an average of 5 types each (5000 types). Each type has 1000 records for example (5,000,000 records). Each record has on average 5 fields gives me a total of 25,000,000 rows at the lowest level of the EAV model.

A down stream process will also be binding each individual users data to a jquery grid, so first fetching this data and transposing the data just seems so costly to me. what happens when you have 10k tenants or 50k tenants... I understand that MySQL can handle this type of thing when optimized, however it just seems like im shooting myself in the foot.

I want to do it another way. However, I have a bad gut feeling about what I am proposing as it goes against everything I know, so I would like some real experts with practical knowledge to validate or criticise my approach. If you validate, please tell me what I need to do to support it and get it working. If you criticise, please tell me the pitfalls I will hit in the short term and the long term.

My Proposal.

  1. Shard the system using domain partitioning such that there is a maximum set of tenants in any particular shard. The master catalogue will reference which tenant belongs to which shard
  2. For each Shard, when a user defines a type, create a new table to hold this type. Hold a mapping table in the shard, which links the user to his defined types (custom tables).

This essentially means that I am going to have a handful of core tables in one shard and 1000's of custom tables.

Now to me, usually having that many tables in a database usually tells me that there is something wrong with the schema or that something has been designed incorrectly, but for this scenario, I'm just curious to know if it is a feasible approach. In my previous example, it would mean that I have 5000 tables in the shard, with only 1000 rows each. which to me seems a better approach than using EAV. Based on the user, you find the Type and you bind the data to the grid.

Some Notes to consider

  1. The multitenant architecture allows users to have their own users. So potentially I have 1000 subscribers, but 5000 users. So the database connections need to be managed. Will I hit problems managing the connections?

  2. Will I hit table caching related problems? Will I have problems flushing tables?

  3. Where may I hit performance issues with this design? I understand that the master catalouge database may be a bottleneck, but the load on this database will not be too heavy.

  4. Development has started already, don't ask me to change to a NoSQL database!

Another suggestion was to also continue using EAV but within the shard. What do you think of this idea?

Please don't pull any punches! I need to hear it all. Thanks in advance.


Solution

  • I think in terms of scaling the data, you will find that managing thousands of relatively small custom tables will do better than using EAV. I have consulted for customers with over 100,000 tables on a single MySQL instance.

    You will run into different scalability issues when you have tens of thousands of tables on an instance, but if you've already got an architecture that supports sharding, you're ready to further subdivide the users so you don't have too many on any one instance.

    The catalog tables are really good to put into cache (e.g. memcached) because the mapping of user to shard instance changes very infrequently. That will reduce the load on the catalog.

    I'd also look into MySQL's partitioning for the catalog, and the table that maps users to their custom tables. As well as any other common (non-custom) tables. You can partition any of these by userid, and rely on partition pruning to make multi-tenant tables act like much smaller tables.