Interpretation of a MySQL Explain and Suggestions to enhance query response time

I am running a filtered down aggregating query and would like a little feedback on how to obtain better response times for the query.

The Query (runs, but averages 400+ seconds):

select data_date,sum(closeprice) from moving_avgs
where
    symbol in (select distinct symbol from moving_avgs
                where
                ma200_close     >= 5.00 and
                ma200_volume    >= 400000 and
                data_date   = (select min(data_date) from moving_avgs
                                where year(data_date) = 2007) 
                )
group by data_date;

My EXPLAIN Query reads (formatted to read in this environment):

id:         1
select_type:    PRIMARY
table:      moving_avgs
type:       ALL
possible_keys:  NULL
key:        NULL
key_len:        NULL
ref:        NULL
rows:       6250033
Extra:      Using where; Using temporary; Using filesort

id:         2
select_type:    DEPENDENT SUBQUERY
table:      moving_avgs
type:       unique_subquery
possible_keys:  PRIMARY,symbol,data_date,ma200_close,ma200_volume
key:        PRIMARY
key_len:        29
ref:        func,const
rows:       1
Extra:      Using where

id:         3
select_type:    SUBQUERY
table:      moving_avgs
type:       index
possible_keys:  NULL
key:        data_date
key_len:        3
ref:        NULL
rows:       6250033
Extra:      Using where; Using index

My my.ini [mysqld] & [myisamchk] sections read (running on a 4GB dual-processor AMD laptop):

[mysqld]
port        = 3306
socket      = /tmp/mysql.sock
skip-external-locking
key_buffer_size = 512M
max_allowed_packet = 20M
table_open_cache = 256
sort_buffer_size = 8M
read_buffer_size = 8M
read_rnd_buffer_size = 16M
myisam_sort_buffer_size = 256M
thread_cache_size = 8
query_cache_size= 132M
basedir=c:/wamp/bin/mysql/mysql5.5.24
log-error=c:/wamp/logs/mysql.log
datadir=c:/wamp/bin/mysql/mysql5.5.24/data
# Try number of CPU's*2 for thread_concurrency
thread_concurrency = 8

[myisamchk]
key_buffer_size = 256M
sort_buffer_size = 256M
read_buffer = 4M
write_buffer = 4M

Thanks!

Solution

Could you list the result of a SHOW CREATE TABLE?

Also could you try this variant and see how long it takes:

SELECT  
    data_date,  
    sum(closeprice)  
FROM moving_avgs  
INNER JOIN  
(  
    SELECT distinct symbol  
    FROM moving_avgs  
    WHERE    
        ma200_close     >= 5.00 and    
        ma200_volume    >= 400000 and  
        data_date   =  
        (  
            SELECT min(data_date)  
            FROM moving_avgs  
            WHERE year(data_date) = 2007  
        )   
) symbols ON symbols.symbol = moving_avgs.symbol  
GROUP BY data_date;

I suspect three source of slowness (in combination or separately). The reasoning behind the first two is pretty straightforward:

(1) The indexes on your table may not be designed as well as they could be. I'm not seeing good index usage in your EXPLAIN info.

(2) The way the subquery is designed in the WHERE might be forcing the engine to not use the index you have on 'symbol'--a loss of the performance the index might otherwise give you. The EXPLAIN output makes it look like this loss is the case.

(3) Another way of stating (2) without talking about the roll of indices is that the engine is possibly working inefficiently with the main subquery (the subquery in the WHERE) based on incorrectly deducing a relationship to the outer query (i.e., it thinks there is a relationship--that your query is a correlated subquery--and it's making a bad choice about that relationship).

[Note: The way your WHERE is written, the subquery is NOT a correlated query and it could be executed efficiently, and the IN could be resolved relatively efficiently (although possibly without benefit of an index); however, the engine might be interpreting this situation poorly--you do have a somewhat complicated nested sub-query situation that might make the engine get it wrong].

In any case, moving the subquery to a join could fix such a scenario, because it eliminates any possibility of the engine trying to inefficiently relate the sub-query to the rest of your query. When the subquery is the source for a join, the engine has to resolve it before the rest of the body of your query can be considered. This eliminates any bad deductions about a relationship between subquery and rest of the query that the engine might be making.