Search code examples
mysqlperformancejoinexplain

Slow query with joined derived tables


I have a few queries on a "custom dashboard" of my application, and one of them is taking 10-12 seconds to execute. Using EXPLAIN I can see why it's slow, but I don't know what to do about it. Here is the query:

SELECT person.PersonID,FullName,Furigana,qualdate FROM person
  INNER JOIN (
    SELECT pq.PersonID,MAX(ContactDate) AS qualdate FROM person pq
    INNER JOIN contact cq ON pq.PersonID=cq.PersonID
    WHERE cq.ContactTypeID IN (22,26,45) GROUP BY pq.PersonID
  ) qual ON person.PersonID=qual.PersonID
  LEFT OUTER JOIN (
    SELECT pe.personID,MAX(ContactDate) AS elimdate FROM person pe
    INNER JOIN contact ce ON pe.PersonID=ce.PersonID WHERE ce.ContactTypeID IN (25,31,30,41,23,42,2,33,35,29,12)
    GROUP BY pe.PersonID
  ) elim ON qual.PersonID=elim.PersonID
  LEFT OUTER JOIN (
    SELECT po.personID FROM person po
    INNER JOIN percat pc ON po.PersonID=pc.PersonID WHERE pc.CategoryID=38
  ) overseas ON qual.PersonID=overseas.PersonID
  WHERE (elimdate IS NULL OR qualdate > elimdate)
  AND qualdate < CURDATE()-INTERVAL 7 DAY
  AND overseas.PersonID IS NULL
  ORDER BY qualdate

And here is the EXPLAIN result:

id  select_type  table        type    possible_keys           key         key_len  ref                      rows  Extra
1   PRIMARY      <derived2>   ALL     NULL                    NULL        NULL     NULL                     5447  Using where; Using temporary; Using filesort
1   PRIMARY      <derived3>   ALL     NULL                    NULL        NULL     NULL                     5565  Using where
1   PRIMARY      <derived4>   ALL     NULL                    NULL        NULL     NULL                     9     Using where; Not exists
1   PRIMARY      person       eq_ref  PRIMARY                 PRIMARY     4        qual.PersonID            1     
4   DERIVED      pc           ref     PRIMARY,CategoryID      CategoryID  4                                 8     
4   DERIVED      po           eq_ref  PRIMARY                 PRIMARY     4        kizuna_misa.pc.PersonID  1     Using index
3   DERIVED      pe           index   PRIMARY                 PRIMARY     4        NULL                     5964  Using index
3   DERIVED      ce           ref     PersonID,ContactTypeID  PersonID    4        kizuna_misa.pe.PersonID  1     Using where
2   DERIVED      pq           index   PRIMARY                 PRIMARY     4        NULL                     5964  Using index
2   DERIVED      cq           ref     PersonID,ContactTypeID  PersonID    4        kizuna_misa.pq.PersonID  1     Using where

I'm sure the first line of the EXPLAIN reveals the problem (comparing with similar queries, it appears that the second line isn't too slow), but I don't know how to fix it. I already have indexes on every column that appears in the joins, but since the tables are <derived2> etc., I guess indexes are irrelevant.

The objective (since it's probably not obvious to someone unfamiliar with my application and schema) is a followup tickler list - if one of the #22/26/45 contacts has occurred but nothing has been done in response (either one of several other contacts or designating by a category assignment that the person is overseas), then the person should appear in the list for followup after waiting a week. Subqueries are easier for me to write and understand than these messy joins, but I can't check the sequence of dates (and subqueries are often slow, also).

EDIT (in response to Rick James):

MySQL version is 5.0.95 (yeah, I know...). And here is SHOW CREATE TABLE for the three tables involved, even though most of the fields in person are irrelevant:

CREATE TABLE `contact` (
 `ContactID` int(11) unsigned NOT NULL auto_increment,
 `PersonID` int(11) unsigned NOT NULL default '0',
 `ContactTypeID` int(11) unsigned NOT NULL default '0',
 `ContactDate` date NOT NULL default '0000-00-00',
 `Description` text,
 PRIMARY KEY  (`ContactID`),
 KEY `ContactDate` (`ContactDate`),
 KEY `PersonID` (`PersonID`),
 KEY `ContactTypeID` (`ContactTypeID`)
) ENGINE=MyISAM AUTO_INCREMENT=16901 DEFAULT CHARSET=utf8

CREATE TABLE `percat` (
 `PersonID` int(11) unsigned NOT NULL default '0',
 `CategoryID` int(11) unsigned NOT NULL default '0',
 PRIMARY KEY  (`PersonID`,`CategoryID`),
 KEY `CategoryID` (`CategoryID`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8

CREATE TABLE `person` (
 `PersonID` int(11) unsigned NOT NULL auto_increment,
 `FullName` varchar(100) NOT NULL default '',
 `Furigana` varchar(100) NOT NULL default '',
 `Sex` enum('','M','F') character set ascii NOT NULL default '',
 `HouseholdID` int(11) unsigned NOT NULL default '0',
 `Relation` varchar(6) character set ascii NOT NULL default '',
 `Title` varchar(6) NOT NULL default '',
 `CellPhone` varchar(30) character set ascii NOT NULL default '',
 `Email` varchar(70) character set ascii NOT NULL default '',
 `Birthdate` date NOT NULL default '0000-00-00',
 `Country` varchar(30) NOT NULL default '',
 `URL` varchar(150) NOT NULL default '',
 `Organization` tinyint(1) NOT NULL default '0',
 `Remarks` text NOT NULL,
 `Photo` tinyint(1) NOT NULL default '0',
 `UpdDate` date NOT NULL default '0000-00-00',
 PRIMARY KEY  (`PersonID`),
 KEY `Furigana` (`Furigana`),
 KEY `FullName` (`FullName`),
 KEY `Email` (`Email`),
 KEY `Organization` (`Organization`,`Furigana`)
) ENGINE=MyISAM AUTO_INCREMENT=6063 DEFAULT CHARSET=utf8

Attempted suggestion:

I tried to implement Rick James's suggestion of putting the subselects in the field list (I didn't even know that was possible), like this:

SELECT
  p.PersonID,
  FullName,
  Furigana,
  (SELECT MAX(ContactDate) FROM contact cq
    WHERE cq.PersonID=p.PersonID
    AND  cq.ContactTypeID IN (22,26,45))
  AS qualdate,
  (SELECT MAX(ContactDate) FROM contact ce
    WHERE ce.PersonID=p.PersonID
    AND ce.ContactTypeID IN (25,31,30,41,23,42,2,33,35,29,12))
  AS elimdate
FROM person p
WHERE (elimdate IS NULL OR qualdate > elimdate)
AND qualdate < CURDATE()-INTERVAL 7 DAY
AND NOT EXISTS (SELECT * FROM percat WHERE CategoryID=38 AND percat.PersonID=p.PersonID)
ORDER BY qualdate

But it complains: #1054 - Unknown column 'elimdate' in 'where clause' According to the docs, WHERE clauses are interpreted before field lists, so this approach isn't going to work.


Solution

  • You have an interesting query. I am not sure what the best solution is. Here are two guesses:

    Plan A

    INDEX(qualdate)
    

    may help. Please provide SHOW CREATE TABLE.

    This construct optimizes poorly:

    FROM ( SELECT ... )
    JOIN ( SELECT ... )
    

    In your case, overseas should probably turned into a JOIN, not a subselect. And the other two should probably be turned into a different flavor of dependent subquery:

    SELECT  ..., 
            ( SELECT MAX(...) ... )  AS qualdate,
            ( SELECT MAX(...) ... )  AS elimdate
        FROM ...
    

    What version of MySQL are you running?

    Plan B

    If practical, fold these into the subqueries so that they generate fewer rows, thereby leading to less effort at the outer query. (One per subquery)

    elimdate IS NOT NULL
    qualdate < CURDATE()-INTERVAL 7 DAY
    overseas.PersonID IS NOT NULL
    

    Perhaps the NULL tests apply to LEFT and this suggestion may not apply.