Search code examples
mysqlexplainfilesort

MySQL Explain: what's causing 'Using temporary; Using filesort'


I'm planning on creating a view using this SQL SELECT, but the explain for it shows it's using temporary and using filesort. I can't figure out what indices I need in order to fix this problem. Mostly, I'm wondering why it's using filesort intead of using an index to sort.

Here are my tables:

CREATE TABLE `learning_signatures` (
  `signature_id` int(11) NOT NULL AUTO_INCREMENT,
  `signature_file` varchar(100) NOT NULL,
  `signature_md5` varchar(32) NOT NULL,
  `image_file` varchar(100) NOT NULL,
  PRIMARY KEY (`signature_id`),
  UNIQUE KEY `unique_signature_md5` (`signature_md5`)
) ENGINE=InnoDB AUTO_INCREMENT=640 DEFAULT CHARSET=latin1

CREATE TABLE `learning_user_suggestions` (
  `user_suggestion_id` int(11) NOT NULL AUTO_INCREMENT,
  `signature_id` int(11) NOT NULL,
  `ch` char(1) NOT NULL,
  `time_suggested` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `user_id` int(11) NOT NULL,
  PRIMARY KEY (`user_suggestion_id`),
  KEY `char_index` (`ch`),
  KEY `ls_sig_id_indx` (`signature_id`),
  KEY `user_id_indx` (`user_id`),
  KEY `sig_char_indx` (`signature_id`,`ch`)
) ENGINE=InnoDB AUTO_INCREMENT=1173 DEFAULT CHARSET=latin1

And here is the problematic SQL statement I'm planning on using in my view:

select ls.signature_id, ls.signature_file, ls.signature_md5, ls.image_file, sug.ch , count(sug.ch) AS suggestion_count
from (`learning_signatures` `ls` left join `learning_user_suggestions` `sug` on(ls.signature_id = sug.signature_id))
group by ls.signature_id, sug.ch;

Output from explain:

id  select_type table   type    possible_keys                   key             key_len ref                 rows    Extra
1   SIMPLE      ls      ALL     NULL                            NULL            NULL    NULL                514     "Using temporary; Using filesort"
1   SIMPLE      sug     ref     ls_sig_id_indx,sig_char_indx    ls_sig_id_indx  4       wwf.ls.signature_id 1

Another example, this time using a where clause:

explain select ls.signature_id, ls.signature_file, ls.signature_md5, ls.image_file, sug.ch , count(sug.ch) AS suggestion_count
from (`learning_signatures` `ls` left join `learning_user_suggestions` `sug` on(ls.signature_id = sug.signature_id))
WHERE signature_md5 = '75f8a5b1176ecc2487b90bacad9bc4c'
group by ls.signature_id, sug.ch;

Explain output:

id  select_type table   type    possible_keys                key                    key_len ref     rows    Extra
1   SIMPLE      ls      const   unique_signature_md5         unique_signature_md5   34      const   1       "Using temporary; Using filesort"
1   SIMPLE      sug     ref     ls_sig_id_indx,sig_char_indx ls_sig_id_indx         4       const   1   

Solution

  • In your first query, what you do is join your signatures table with user suggestions, getting lots of rows, and then group results using some columns from user suggestions. But there is no index for the joined table to help with grouping as it would have to be defined on previously joined table. What you should instead do is try to create a derived table from user suggestions that is already groupped by ch and signature_id and then join it:

    SELECT ls.signature_id, ls.signature_file, ls.signature_md5, ls.image_file, 
           sug.ch, sug.suggestion_count
    FROM learning_signatures ls
    LEFT JOIN 
      (SELECT s.signature_id, s.ch, count(s.ch) as suggestion_count
        FROM learning_user_suggestions s 
        GROUP BY s.signature_id, s.ch ) as sug
    ON ls.signature_id = sug.signature_id
    

    Optimizer should be able now to use your sig_char_indx index for groupping, the derived table will be not bigger than your signatures table and you join both using unique column. You will still have to do a full scan over signatures table, but that cannot be avoided because you are selecting all of it anyway.

    As for the second query, if you want to restrict signatures to a single one just append

    WHERE ls.signature_md5='75f8a5b1176ecc2487b90bacad9bc4c'
    

    to the end of previous query and group by only s.ch, because only one signature_id will match your md5 anyway. Optimizer should now use md5 index for where and char_index for grouping.