Search code examples
databasemysqlsubquerycorrelated-subquery

MySQL multiple dependent subqueries, painfully slow


I have a working query that retrieves the data that I need, but unfortunately it is painfully slow (runs over 3 minutes). I have indexes in place, but I think the problem is the multiple dependent subqueries. I've been trying to rewrite the query using joins but I can't seem to get it to work. Any help would be greatly appreciated.

The tables:

Basically, I have 2 tables. The first (prices) holds the prices of items in a store. Each row is the price of an item that day, and new rows are added every day with an updated price.

The second table (watches_US) holds the item information (name, description, etc).

CREATE TABLE `prices` (
`prices_id` int(11) NOT NULL auto_increment,
`prices_locale` enum('CA','DE','FR','JP','UK','US') NOT NULL default 'US',
`prices_watches_ID` char(10) NOT NULL,
`prices_date` datetime NOT NULL,
`prices_am` varchar(10) default NULL,
`prices_new` varchar(10) default NULL,
`prices_used` varchar(10) default NULL,
PRIMARY KEY  (`prices_id`),
KEY `prices_am` (`prices_am`),
KEY `prices_locale` (`prices_locale`),
KEY `prices_watches_ID` (`prices_watches_ID`),
KEY `prices_date` (`prices_date`)
) ENGINE=MyISAM  DEFAULT CHARSET=utf8 AUTO_INCREMENT=61764 ;

CREATE TABLE `watches_US` (
`watches_ID` char(10) NOT NULL,
`watches_date_added` datetime NOT NULL,
`watches_last_update` datetime default NULL,
`watches_title` varchar(255) default NULL,
`watches_small_image_height` int(11) default NULL,
`watches_small_image_width` int(11) default NULL,
`watches_description` text,
PRIMARY KEY  (`watches_ID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;   

The query retrieves the last 10 prices changes over a period of 30 hours, ordered by the size of the price change. So I have subqueries to get the newest price, the oldest price within 30 hours, and then to calculate the price change.

Here's the query:

SELECT watches_US.*, prices.*, watches_US.watches_ID as current_ID,
    ( SELECT prices_am FROM prices WHERE prices_watches_ID = current_ID AND prices_locale = 'US' ORDER BY prices_date DESC LIMIT 1 ) as new_price, 
    ( SELECT prices_date FROM prices WHERE prices_watches_ID = current_ID AND prices_locale = 'US' ORDER BY prices_date DESC LIMIT 1 ) as new_price_date, 
    ( SELECT prices_am FROM prices WHERE ( prices_watches_ID = current_ID AND prices_locale = 'US') AND ( prices_date >= DATE_SUB(new_price_date,INTERVAL 30 HOUR) ) ORDER BY prices_date ASC LIMIT 1 ) as old_price,
    ( SELECT ROUND(((new_price - old_price)/old_price)*100,2) ) as percent_change,
    ( SELECT (new_price - old_price) ) as absolute_change
FROM watches_US 
LEFT OUTER JOIN prices ON prices.prices_watches_ID = watches_US.watches_ID 
WHERE ( prices_locale = 'US' )
AND ( prices_am IS NOT NULL )
AND ( prices_am != '' )
HAVING ( old_price IS NOT NULL )
AND ( old_price != 0 )
AND ( old_price != '' )
AND ( absolute_change < 0 )
AND ( prices.prices_date = new_price_date )
ORDER BY absolute_change ASC
LIMIT 10

How would I rewrite this to use joins instead, or otherwise optimize this so it doesn't take over 3 minutes to get a result? Any help would be greatly appreciated!

Thank you kindly.

UPDATE

Using the answers from below, I got the query down to this, which takes 2 seconds to run:

SELECT watches_US.*, prices.*,
    ( SELECT prices_am FROM prices prices2 WHERE ( prices2.prices_watches_ID = watches_US.watches_ID AND prices2.prices_locale = 'US') AND ( prices2.prices_date >= DATE_SUB(prices.prices_date,INTERVAL 30 HOUR) ) ORDER BY prices2.prices_date ASC LIMIT 1 ) as old_price,
    ( SELECT ROUND(((prices.prices_am - old_price)/old_price)*100,2) ) as percent_change,
    ( SELECT (prices.prices_am - old_price) ) as absolute_change
FROM watches_US 
LEFT OUTER JOIN prices ON prices.prices_watches_ID = watches_US.watches_ID AND prices.prices_locale = 'US'
WHERE ( prices.prices_am IS NOT NULL )
AND ( prices.prices_am != '' )
AND ( prices.prices_date IN (SELECT MAX(prices_date) FROM prices WHERE prices_watches_ID = watches_US.watches_ID AND prices_locale = 'US' ) )
HAVING ( old_price IS NOT NULL )
AND ( old_price != 0 )
AND ( old_price != '' )
AND ( absolute_change < 0 )
ORDER BY absolute_change ASC
LIMIT 10

It could probably still do with some work, but it's usable as is. Thank you all for your help!


Solution

  • Here's a partial idea:

    SELECT watches_US.*, prices.*, watches_US.watches_ID as current_ID,
        prices2.prices_am as new_price, 
        prices2.prices_date as new_price_date, 
        ( SELECT prices_am FROM prices WHERE ( prices_watches_ID = current_ID AND prices_locale = 'US') AND ( prices_date >= DATE_SUB(new_price_date,INTERVAL 30 HOUR) ) ORDER BY prices_date ASC LIMIT 1 ) as old_price,
        ( SELECT ROUND(((new_price - old_price)/old_price)*100,2) ) as percent_change,
        ( SELECT (new_price - old_price) ) as absolute_change
    FROM watches_US 
    LEFT OUTER JOIN prices ON prices.prices_watches_ID = watches_US.watches_ID 
    LEFT OUTER JOIN prices prices2 ON prices2.prices_watches_ID = watches_US.watches_ID 
    WHERE ( prices_locale = 'US' )
    AND ( prices_am IS NOT NULL )
    AND ( prices_am != '' )
    AND ( prices2.prices_date IN (SELECT MAX(price_date) FROM prices WHERE prices_watches_ID = watches_US.watches_ID AND prices_locale = 'US' )
    HAVING ( old_price IS NOT NULL )
    AND ( old_price != 0 )
    AND ( old_price != '' )
    AND ( absolute_change < 0 )
    AND ( prices.prices_date = new_price_date )
    ORDER BY absolute_change ASC
    LIMIT 10
    

    The changes are the second join on prices that's used to get new_price and new_price_date with the a WHERE clause to select only the most recent entry. You could probably clean it up a bit but I wanted to get it out there.