Search code examples
mysqlsqldatabase-designscalabilityentity-attribute-value

Meta table VS table with many fields, on a large scale. Performance-wise


Here is a concrete example:

Wordpress stores user information(meta) in a table called wp_usermeta where you get the meta_key field (ex: first_name) and the meta_value (John)

However, only after 50 or so users, the table already packs about 1219 records.

So, my question is: On a large scale, performance wise, would it be better to have a table with all the meta as a field, or a table like WordPress does with all the meta as a row ?

Indexes are properly set in both cases. There is little to no need of adding new metas. Keep in mind that a table like wp_usermeta must use a text/longtext field type (large footprint) in order to accommodate any type of data that could be entered.

My assumptions are that the WordPress approach is only good when you don't know what the user might need. Otherwise:

  • retrieving all the meta requires more I/O because the fields aren't stored in a single row. The field isn't optimised.
  • You can't really have an index on the meta_value field without suffering from major drawbacks (indexing a longtext ? unless it's a partial index...but then, how long?)
  • Soon, your database is cluttered with many rows, cursing your research even for the most precise meta
  • Developer-friendly is absent. You can't really do a join request to get everything you need and displayed properly.

I may be missing a point though. I'm not a database engineer, and I know only the basics of SQL.


Solution

  • You're talking about Entity-Attribute-Value.

    - Entity    = User, in your Wordpress Example  
    - Attribute = 'First Name', 'Last Name', etc  
    - Value     = 'John', 'Smith', etc  
    

    Such a schema is very good at allowing a dynamic number of Attributes for any given Entity. You don't need to change the schema to add an Attribute. Depending on the queries, the new attributes can often be used without changing any SQL at all.

    It's also perfectly fast enough at retrieving those attributes values, provided that you know the Entity and the Attribute that you're looking for. It's just a big fancy Key-Value-Pair type of set-up.

    It is, however, not so good where you need to search the records based on the Value contents. Such as, get me all users called 'John Smith'. Trivial to ask in English. Trivial to code against a 'normal' table; first_name = 'John' AND last_name = 'Smith'. But non-trivial to write in SQL against EAV, and awful relative performance; (Get all the Johns, then all the Smiths, then intersect them to get Entities that match both.)

    There is a lot said about EAV on-line, so I won't go in to massive detail here. But a general rule of thumb is: If you can avoid it, you probably should.