There is a table words
contains word
and id
columns and 50000 records. I know words with the structure %XC%A
are between the id=30000
and the id=35000
.
Now consider the following queries:
SELECT * FROM words WHERE word LIKE '%XCX%A'
and
SELECT * FROM words WHERE id>30000 and id < 35000 and word LIKE '%XCX%A'
From time consuming perspective, is there any difference between them?
Well, let's find out...
Here's a data set of approximately 50000 words. Some of the words (but only in the range 30000 to 35000) follow the pattern described:
EXPLAIN
SELECT * FROM words WHERE word LIKE '%XCX%A';
+----+-------------+-------+-------+---------------+------+---------+------+-------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+------+---------+------+-------+--------------------------+
| 1 | SIMPLE | words | index | NULL | word | 14 | NULL | 50976 | Using where; Using index |
+----+-------------+-------+-------+---------------+------+---------+------+-------+--------------------------+
EXPLAIN
SELECT * FROM words WHERE id>30000 and id < 35000 and word LIKE '%XCX%A';
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| 1 | SIMPLE | words | range | PRIMARY | PRIMARY | 4 | NULL | 1768 | Using where |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
We can see that the first query scans the entire dataset (50976 rows), while the second query only scans rows between the given ids (in my example there are approximately 1768 rows between ids 30000 and 35000; there are lots of unused ids, but that's just a side effect of the way in which the data was created).
So, we can see that by adding the range, MySQL only has to scan (at worst) one fifth of the data set (5000 rows instead oof 50000 rows). This isn't going to make much of a difference on such a small dataset, but it will on dataset 100, or 1000 times this size.
One thing to note is that the two queries will return the same data set (because we know that valid values are only to be found within that id range), but they won't necessarily return the dataset in the same order. For consistency, you would need an ORDER BY clause.
Another thing to note is, of course, that there's no point indexing word
(for this query anyway), because '%...'
cannot use an index.