I have the following SQL but wish to improve the performance (Option 2 below currently takes about 24 seconds).
EXPLAIN MySQL - using example in answer below
Option 1 - basic select (very slow)
select f.name, f.flowid, m.traceid, m.traceday, m.logtimestamp
from flow f, messageinfo m
where traceid = (select max(traceid) from messageinfo where flowid = f.flowid)
UPDATE
Option 2 - with inner join (faster but still too slow)
select m.traceid, f.name, f.flowid, m.traceday, m.logtimestamp
from flow f
inner join messageinfo m
on m.flowid = f.flowid
where traceid = (select max(traceid) from messageinfo where flowid =
f.flowid)
order by f.name
I need to loop through the flow records (there are approx 900) and for each flow select the related flow in the messageinfo table that is the most recent, i.e. has the highest traceid for that particular flow. Both SQL examples above work but would like to see if I can improve performance for Option 2 if possible. The final result should be a best effort and only return a traceid where a related flow record can be found (like an inner join). Can anyone suggest a more performant way? Bear in mind I would like to return the data from both tables in the final result (as shown with example above).
I need to also emphasize I don't have the freedom to modify the DB with new/updated indexes. So mainly looking for improvements to SQL without DB modifications.
That said, it is still beneficial noting any index improvements for future work.
Another way to rewrite your query as using join, Move your dependent sub query part to sub clause and join this with your main query.
select m.traceid, f.name, f.flowid, m.traceday, m.logtimestamp
from flow f
inner join messageinfo m on m.flowid = f.flowid
inner join (
select flowid, max(traceid) traceid
from messageinfo
group by flowid
) m1 on m.flowid = m1.flowid and m.traceid = m1.traceid
order by f.name
Also add composite index on (flowid,traceid)