Search code examples
mysqldatabasemany-to-many

Many-to-Many matching multiple optimization


I have an application which contains a many-to-many relationship. I have a need to select all rows from one table that are associated to all rows from a variable set in the other table.

For example, I would need to select all foo entities that are associated to bar entities A, B, C and E. The user could select 1, 5, 12 or 50 bar entities to filter the foo entities by

The relevant fields from the tables: (ids are using uuid)

/* ~20k rows */
CREATE TABLE `foo` (
   `id` char(36) COLLATE utf8_unicode_ci NOT NULL,
  `title` text COLLATE utf8_unicode_ci NOT NULL,
  PRIMARY KEY (`id`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

/* ~30k rows */
CREATE TABLE `bar` (
  `id` char(36) COLLATE utf8_unicode_ci NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

/* ~150k rows */
CREATE TABLE `foo_bar` (
  `id` char(36) COLLATE utf8_unicode_ci NOT NULL,
  `foo_id` char(36) COLLATE utf8_unicode_ci DEFAULT NULL,
  `bar_id` char(36) COLLATE utf8_unicode_ci DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `foo_id_foreign` (`foo_id`),
  KEY `bar_id_foreign` (`bar_id`),
  CONSTRAINT `bar_id_foreign` FOREIGN KEY (`bar_id`) 
      REFERENCES `bar` (`id`) ON DELETE CASCADE,
  CONSTRAINT `foo_id_foreign` FOREIGN KEY (`foo_id`) 
      REFERENCES `foo` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

I have tried two different approaches that I have seen from different SO answers: multiple joins and a subquery. Multiple joins seems to work reasonably well, but doesn't seem like it would be highly scalable. Running a subquery seems like it should scale better but runs for hours.

Multi join. It works, but every additional join increases the elapsed time exponentially, as expected. 3 bars takes around 800ms, which is definitely high. The explain looks reasonable.

select `foo`.* 
from `foo`
inner join foo_bar `fb1` on `fb1`.`foo_id` = `foo`.`id`
inner join bar `b1` on `b1`.`id` = `fb1`.`bar_id` AND `b1`.`id` = :some_uuid1
inner join foo_bar `fb2` on `fb2`.`foo_id` = `foo`.`id`
inner join bar `b2` on `b2`.`id` = `fb2`.`bar_id` AND `b2`.`id` = :some_uuid2
inner join foo_bar `fb3` on `fb3`.`foo_id` = `foo`.`id`
inner join bar `b3` on `b3`.`id` = `fb3`.`bar_id` AND `b3`.`id` = :some_uuid3
group by `foo`.`id`
order by `foo`.`title` asc 
limit 25 offset 0

Subquery. Runs indefinitely. Same effect with where in (subquery) as inner join subquery, though the explain ends up looking a little different.

select `foo`.* 
from `foo`
inner join (
    select `foo_id` 
    from `foo_bar` 
    inner join `bar` 
        on `bar`.`id` = `foo_bar`.`bar_id`
    where `bar`.`id` in (:some_uuid1, :some_uuid2, :some_uuid3) 
    group by `foo_id` 
    having COUNT(*) = 3
) as `subset` on `foo`.`id`  = `subset`.`foo_id`
order by `foo`.`title` asc 
limit 25 offset 0

explain:

id  select_type table   type    key            key_len rows  extra
1   PRIMARY     derived ALL     NULL           NULL    6618  Using temporary; Using filesort
1   PRIMARY     foo     eq_ref  PRIMARY        108     1   
2   DERIVED     bar     const   PRIMARY        108     1     Using index; Using temporary; Using filesort
2   DERIVED     foo_bar ref     bar_id_foreign 109     16094 Using where

My question is are there any optimizations that I could apply to make this situation usable and scalable?


Solution

  • Your normalization is fine. It's good that you have a joining table foo_bar to handle the many to many relationship.

    As far as optimizing your JOIN, you don't need to add a new join each time you want to check for a new id, you can use the IN operator:

    INNER JOIN foo_bar fb1 ON fb1.foo_id = foo.id AND fb1.id 
       IN (some_uuid1, some_uuid2, some_uuid3);
    

    Then, if you want to get rows where it matches all three of those, the whole query would look something like this:

    SELECT foo.id, foo.title
    FROM foo
    INNER JOIN foo_bar fb ON fb.foo_id = foo.id AND fb.id IN (some_uuid1, some_uuid2, some_uuid3)
    GROUP BY foo.id
    HAVING COUNT(*) = 3
    ORDER BY foo.title
    LIMIT 25;