Sub-query 1:
SELECT * from big_table
where category = 'fruits' and name = 'apple'
order by yyyymmdd desc
Explain:
table | key | extra
big_table | name_yyyymmdd | using where
Looks great!
Sub-query 2:
SELECT * from big_table
where category = 'fruits' and (taste = 'sweet' or wildcard = '*')
order by yyyymmdd desc
Explain:
table | key | extra
big_table | category_yyyymmdd | using where
Looks great!
Now if I combine those with UNION:
SELECT * from big_table
where category = 'fruits' and name = 'apple'
UNION
SELECT * from big_table
where category = 'fruits' and (taste = 'sweet' or wildcard = '*')
Order by yyyymmdd desc
Explain:
table | key | extra
big_table | name | using index condition, using where
big_table | category | using index condition
UNION RESULT| NULL | using temporary; using filesort
Not so good, it uses filesort.
This is a trimmed down version of a more complexed query, here are some facts about the big_table:
yyyymmdd_category_taste_name
but Mysql is not using it.SELECT * FROM big_table
WHERE category = 'fruits'
AND ( name = 'apple'
OR taste = 'sweet'
OR wildcard = '*' )
ORDER BY yyyymmdd DESC
And have INDEX(catgory)
or some index starting with category
. However, if more than about 20% of the table is category = 'fruits'
will probably decide to ignore the index and simply do a table scan. (Since you say there are only 5 categories, I suspect the optimizer will rightly eschew the index.)
Or this might be beneficial: INDEX(category, yyyymmdd)
, in this order.
The UNION
had to do a sort (either in memory on disk, it is not clear) because it was unable to fetch the rows in the desired order.
A composite index INDEX(yyyymmdd, ...)
might be used to avoid the 'filesort', but it won't use any columns after yyyymmdd
.
When constructing a composite index, start with any WHERE
columns compared '='. After that you can add one range or group by
or order by
. More details.
UNION
is often a good choice for avoiding a slow OR
, but in this case it would need three indexes
INDEX(category, name)
INDEX(category, taste)
INDEX(category, wildcard)
and adding yyyymmdd would not help unless you add a LIMIT
.
And the query would be:
( SELECT * FROM big_table WHERE category = 'fruits' AND name = 'apple' )
UNION DISTINCT
( SELECT * FROM big_table WHERE category = 'fruits' AND taste = 'sweet' )
UNION DISTINCT
( SELECT * FROM big_table WHERE category = 'fruits' AND wildcard = '*' )
ORDER BY yyyymmdd DESC
Adding a limit would be even messier. First tack yyyymmdd
on the end of each of the three composite indexes, then
( SELECT ... ORDER BY yyyymmdd DESC LIMIT 10 )
UNION DISTINCT
( SELECT ... ORDER BY yyyymmdd DESC LIMIT 10 )
UNION DISTINCT
( SELECT ... ORDER BY yyyymmdd DESC LIMIT 10 )
ORDER BY yyyymmdd DESC LIMIT 10
Adding an OFFSET would be even worse.
Two other techniques -- "covering" index and "lazy lookup" might help, but I doubt it.
Yet another technique is to put all the words in the same column and use a FULLTEXT
index. But this may be problematical for several reasons.