Search code examples
mysqlselectindexingquery-optimization

MySQL where or composite index : will it be faster when performing two separate search?


I have the following table:

CREATE TABLE `FooBar` (
  `id` int,
  `foo` int,
  `bar1` int,
  `bar2` int,
  PRIMARY KEY (`id`),
  KEY `foobar1` (`foo`, `bar1`),
  KEY `foobar2` (`foo`, `bar2`)
) ENGINE=InnoDB;

Now I have the following select query:

select * from FooBar where foo=1 and (bar1=2 or bar2=2);

And the alternative two consecutive selects

select * from FooBar where foo=1 and bar1=2;
select * from FooBar where foo=1 and bar2=2;

Will the time of the single select with "or" be significantly faster, slower or around the same comparing with the total time of the two consecutive selects?


Solution

  • Neither.

    • OR kills performance -- usually leading to ignoring the index and scanning the whole table.
    • Two separate queries -- there is lots of overhead in handling a statement; it is usually much better to combine two statements (unless that leads to other inefficiencies).

    The usual speedup for OR is UNION:

    ( select * from FooBar where foo=1 and bar1=2 )
    UNION ALL
    ( select * from FooBar where foo=1 and bar2=2 )
    ;
    

    UNION ALL is faster than UNION DISTINCT, but ALL could lead to duplicate rows. Pick accordingly.

    (If there will be ORDER BY or pagination, the discussion gets longer.)

    If bar1 and bar2 is an example of an "array" spread across columns, this becomes an argument for not designing the schema that way. Instead, this might be better as a table with one "bar" column, and have (potentially) multiple rows for each foo.

    A simple example of that is a table of persons where you want to include their phone number(s). It is much better to have a table with (person_id, phone_num) -- and the numbers could be for cell, landline, fax, work, home, etc, etc. It is open-ended, zero or more, etc.

    Timing... True, having just a few rows makes it difficult to predict performance as the data grows. Here's one trick; it counts the rows involved, hence can easily discover that OR touches every row in the table: http://mysql.rjweb.org/doc.php/index_cookbook_mysql#handler_counts -- 20 rows should be fine. If a Handler count says 19 or 20, then it did a table scan. About 40 would indicate 2 scans. I predict

    • 20 for your OR query;
    • 2 for your two separate Selects (but this does not take into account the overhead of every query)
    • 2 for my UNION ALL
    • 4 or 6 for my UNION DISTINCT, plus 2 writes (for a necessary temp table).

    The 20 clearly won't scale well to millions of rows; all the rest will.

    EXPLAIN has a lot of issues with counts. Still, in this case, it might be nearly as good as my Handler technique.

    There are no tools that can tell you "what you should have done". (Stackoverflow comes close, but it is very labor intensive.)