Search code examples
mysqlsqlsql-order-bysql-likesql-execution-plan

Really slow query used to be really fast. Explain shows rows=1 on local backup but rows=2287359 on server


I check for spam every now and then using "select * from posts where post like '%http://%' order by id desc limit 10" and searching a few other keywords. Lately the select is impossibly slow.

mysql> explain select * from posts where reply like "%http://%" order by id desc limit 1;
+----+-------------+-----------+-------+---------------+---------+---------+------+---------+-------------+
| id | select_type | table     | type  | possible_keys | key     | key_len | ref  | rows    | Extra       |
+----+-------------+-----------+-------+---------------+---------+---------+------+---------+-------------+
|  1 | SIMPLE      | posts     | index | NULL          | PRIMARY | 4       | NULL | 2287347 | Using where |
+----+-------------+-----------+-------+---------------+---------+---------+------+---------+-------------+
1 row in set (0.00 sec)

On my netbook with 1 gig ram the only difference is it shows the "ROWS" column as being 1. There is only 1.3 mil posts in my netbook. The server has like 6 gigs ram and a fast processor. What should I optimize so it's not horribly slow. Recently I added an index to search by userId, which I'm not sure was a smart choice, but I added it to the backup and production server both a little before this issue started happening. I'd imagine it's related to it not being able to sort in ram due to a missed tweak?

It also seems to be slow when I do stuff like "delete from posts where threadId=X", dunno if related.


Solution

  • With respect to

    SELECT * FROM posts WHERE reply LIKE "%http://%" ORDER BY id DESC LIMIT 1

    Due to the wild cards on both sides of the http://, MySQL will can not use an index on reply to quickly find what you're looking for. Moreover, since you're asking for the one with the largest id, MySQL will have to pull all results to make sure that you have the one with the largest `id'.

    Depending how much of the data of the posts table is made up of the reply, it might be worthwhile to add a compound index on (id, reply), and change the query to something like

    SELECT id FROM posts WHERE reply LIKE "%http://%" ORDER BY id DESC LIMIT 1

    (which will have an index only execution), then join to the posts table or retrive the posts using the retrived ids. If the query has index only execution, and the index fits in memory and is already in memory (due to normal use or by intentionality warming it up), you could potentially speed up the query execution.

    Having said all that, if identical queries on two identical servers with identical data are giving different execution plans and execution times, it might be time to OPTIMIZE TABLE posts to refresh the index statistics and/or defragment the table. If you have recently been adding/removing indexes, things might have gotten astray. Moreover, if the the data is fragmented, when it's pulling rows in PRIMARY KEY order, it could be jumping all over the disk to retrieve the data.

    With respect to DELETE FROM posts WHERE threadId=X, it should be fine as long as there is an index on threadId.