Search code examples
mysqlquery-optimizationbinary-search

Why mysql uses binary search in `IN` clause instead of hash lookup?


Hi there I can see the book High performance MySQL - O'Reilly states that MySQL IN clause use binary search.

Eg:

SELECT name FROM users WHERE city_id IN (2, 1, 6, 4, 3, …, 100

We have 100 cities inside IN() clause. Ideally, we can convert 100 city IDs into a hashmap and do look up with O(1) right?

So why MySQL choose binary search and is there any way better than using IN clause above?


Solution

  • Short answer:

    The efficiency of that clause is only a small part of the query execution.

    Long answer:

    • Building the hash table takes time. But so does the sort to build a binary tree. Depending on other details of the code, either might be faster.

    • If the values had been strings -- and had a collation, that would add a big wrinkle. It may be impossible to use "hash".

    • MySQL was (and, to some extent, still is) "lean and mean". That is, it would realize that binary tree works for all cases, so don't bother with having two ways and having to decide which way is better.

    • Newer versions of MySQL/MariaDB are using other optimizations (such as based on the length of the list).

    • If that list of ids came from a SELECT id ..., you would probably be much better off using a JOIN rather than two separate queries.

    • Which edition of that book are you reading? There are at least three. The first was probably written before InnoDB and Collations and lots of improved optimizations. (I would not trust it anymore.)