Search code examples
jquerymysqlindexingentity-attribute-value

Best way to search properties in MySQL


Hi have a table of users where we store some top level fields like userid, createdOn, email

And a second table called properties that stores a key value pair of additional properties. Like userid, key, value

Keys may be things like name, phone , gender etc.

It’s not possible to flatten this because we may add new properties and don’t want to alter the table.

We have over 10 milllion rows in the users table and over 100 million in the properties table.

What’s the best way to query a system like this. We currently do a join on the tables and a basic where clause.

Are there any more advanced ways of working with data like this? We often have to search on multiple fields, like name like ‘%jo’ and gender=male and country =usa and foo=bar


Solution

  • What you're describing is usually called Entity-Attribute-Value, or EAV. It's a pretty common for the situation you describe, where you have more potential properties than you could make normal columns for.

    CREATE TABLE eav_table (
      entity INT NOT NULL,
      property VARCHAR(64) NOT NULL,
      value VARCHAR(64),
      PRIMARY KEY (entity, property),
      KEY (entity, property, value)
    );
    

    The first two columns are the primary key of the table because you have only one row for each such pair. But it's useful to have a secondary index for all three columns, because that will be the one the query reads most often.

    Querying multiple values with equality conditions is okay. MySQL can do tuple comparisons.

    SELECT ...
    FROM eav_table
    WHERE entity = 1234 AND (property, value) IN (
     ('gender', 'male'),
     ('country', 'usa'),
     ('foo', 'bar')
    )
    

    If there's an index on (entity, property, value), the optimizer will use it, and find the matching rows efficiently.

    The downside is that this syntax does not support LIKE or any other kind of patterns. So you would need to do those more explicitly:

    SELECT ...
    FROM eav_table
    WHERE entity = 1234 AND property = 'name' AND value LIKE '%jo'
    

    The index will still help partially, because entity, property are the left-most columns of the index. But a LIKE pattern with a leading wildcard cannot use an index regardless, so it will have to examine all rows that match the first two columns, and test each one for the pattern. A little less efficient, but at least it will narrow down the search.

    If you also want to do searches like "which entities have the country property usa?" you would want another secondary index with different columns as the left-most columns:

    ALTER TABLE eav_table ADD KEY (property, value);
    

    Then you could search for the property/value and get a set of entities that match:

    SELECT ...
    FROM eav_table
    WHERE (property, value) = ('country', 'usa')
    

    If you have the right indexes to support the searches you need to do, even a table with millions or hundreds of millions of rows works pretty well. But eventually as the table gets larger and larger, you may have to split it into multiple tables or multiple MySQL instances. Planning ahead for databases that grow indefinitely requires you to do some capacity planning and benchmarking.