Search code examples
mysqlsql-order-byquery-optimizationsql-likesql-execution-plan

What to prefer in query optimization: Using filesort or more rows examined


I'm trying to optimize this mysql query using EXPLAIN. Can somebody please help me out over here?

EXPLAIN SELECT * FROM keyword
WHERE keyword LIKE "panasonic%"
AND keyword != "panasonic"
AND price < 3230 AND price > 3370
ORDER BY price DESC
LIMIT 99

Basically I want to find out the keywords which start with "some keyword" but don't exactly match it, and whose price is not in some particular range. And oh, I have to get them in descending order of price (which is causing the problem here).

Explain Output:

id: 1
select_type: SIMPLE
table: keyword
type: range
possible_keys: PRIMARY, keyword_price, price_keyword
key: keyword_price
key_len: 765
ref: NULL
rows: 24
Extra: Using where; Using filesort

Indexes Key_name : column_names

PRIMARY: keyword
keyword_price: keyword, price
price_keyword: price, keyword

Now if I give the hint to use index, and change the query to

EXPLAIN SELECT * FROM keyword USE INDEX (price_keyword)
WHERE keyword LIKE "panasonic%"
AND keyword != "panasonic"
AND price < 3230 AND price > 3370
ORDER BY price DESC
LIMIT 99

Explain Output changes to

id: 1
select_type: SIMPLE
table: keyword
type: index
possible_keys: price_keyword
key: price_keyword
key_len: 790
ref: NULL
rows: 1043044 (WHAT THE ????)
Extra: Using where

The explain output shows the number of rows has increased exponentially but the "Using filesort" has gone.

Which query is better in this case? Can that "rows examined" column be deceptive?

Regards


Solution

  • Which query is better in this case?

    In this specific case, I expect the first query to be better.

    Option A) Pull the ~24 rows that start with the indicated keyword and sorting them by price (note that filesort is an algorithm, and doesn't indicate that the sort is done on disk),

    Option B) Pull ~1 million rows in order of price and checking them all to see if they meet the keyword constraint (until you get 99 of them).

    Having said that, if your keyword was 's', or your data is not uniform, the better option could reverse. At any rate, I imagine that for most applications, Option A) is the winner.

    Can that "rows examined" column be deceptive?

    Definitely. The rows examined in the EXPLAIN is an estimate. Best to check the output of something like

    SHOW SESSION STATUS LIKE 'handler_%'

    to see how many rows you are actually pulling.