I am running a filtered down aggregating query and would like a little feedback on how to obtain better response times for the query.
The Query (runs, but averages 400+ seconds):
select data_date,sum(closeprice) from moving_avgs
where
symbol in (select distinct symbol from moving_avgs
where
ma200_close >= 5.00 and
ma200_volume >= 400000 and
data_date = (select min(data_date) from moving_avgs
where year(data_date) = 2007)
)
group by data_date;
My EXPLAIN Query reads (formatted to read in this environment):
id: 1
select_type: PRIMARY
table: moving_avgs
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 6250033
Extra: Using where; Using temporary; Using filesort
id: 2
select_type: DEPENDENT SUBQUERY
table: moving_avgs
type: unique_subquery
possible_keys: PRIMARY,symbol,data_date,ma200_close,ma200_volume
key: PRIMARY
key_len: 29
ref: func,const
rows: 1
Extra: Using where
id: 3
select_type: SUBQUERY
table: moving_avgs
type: index
possible_keys: NULL
key: data_date
key_len: 3
ref: NULL
rows: 6250033
Extra: Using where; Using index
My my.ini [mysqld] & [myisamchk] sections read (running on a 4GB dual-processor AMD laptop):
[mysqld]
port = 3306
socket = /tmp/mysql.sock
skip-external-locking
key_buffer_size = 512M
max_allowed_packet = 20M
table_open_cache = 256
sort_buffer_size = 8M
read_buffer_size = 8M
read_rnd_buffer_size = 16M
myisam_sort_buffer_size = 256M
thread_cache_size = 8
query_cache_size= 132M
basedir=c:/wamp/bin/mysql/mysql5.5.24
log-error=c:/wamp/logs/mysql.log
datadir=c:/wamp/bin/mysql/mysql5.5.24/data
# Try number of CPU's*2 for thread_concurrency
thread_concurrency = 8
[myisamchk]
key_buffer_size = 256M
sort_buffer_size = 256M
read_buffer = 4M
write_buffer = 4M
Thanks!
Could you list the result of a SHOW CREATE TABLE?
Also could you try this variant and see how long it takes:
SELECT
data_date,
sum(closeprice)
FROM moving_avgs
INNER JOIN
(
SELECT distinct symbol
FROM moving_avgs
WHERE
ma200_close >= 5.00 and
ma200_volume >= 400000 and
data_date =
(
SELECT min(data_date)
FROM moving_avgs
WHERE year(data_date) = 2007
)
) symbols ON symbols.symbol = moving_avgs.symbol
GROUP BY data_date;
I suspect three source of slowness (in combination or separately). The reasoning behind the first two is pretty straightforward:
(1) The indexes on your table may not be designed as well as they could be. I'm not seeing good index usage in your EXPLAIN info.
(2) The way the subquery is designed in the WHERE might be forcing the engine to not use the index you have on 'symbol'--a loss of the performance the index might otherwise give you. The EXPLAIN output makes it look like this loss is the case.
(3) Another way of stating (2) without talking about the roll of indices is that the engine is possibly working inefficiently with the main subquery (the subquery in the WHERE) based on incorrectly deducing a relationship to the outer query (i.e., it thinks there is a relationship--that your query is a correlated subquery--and it's making a bad choice about that relationship).
[Note: The way your WHERE is written, the subquery is NOT a correlated query and it could be executed efficiently, and the IN could be resolved relatively efficiently (although possibly without benefit of an index); however, the engine might be interpreting this situation poorly--you do have a somewhat complicated nested sub-query situation that might make the engine get it wrong].
In any case, moving the subquery to a join could fix such a scenario, because it eliminates any possibility of the engine trying to inefficiently relate the sub-query to the rest of your query. When the subquery is the source for a join, the engine has to resolve it before the rest of the body of your query can be considered. This eliminates any bad deductions about a relationship between subquery and rest of the query that the engine might be making.