Search code examples
mysqlprepared-statement

Optimizing Parameterized MySQL Queries


I have a query that has a number of parameters which if I run from in MySQLWorkbench takes around a second to run.

If I take this query and get rid of the parameters and instead substitute the values into the query then it takes about 22 seconds to run, same as If I convert this query to a parameterized stored procedure and run it (it then takes about 22 seconds).

I've enabled profiling on MySQL and I can see a few things there. For example, it shows the number of rows examined and there's an order of difference (20,000 to 400,000) which I assume is the reason for the 20x increase in processing time.

The other difference in the profile is that the parameterized query sent from MySQLWorkbench still has the parameters in (e.g. where limit < @lim) while the sproc the values have been set (where limit < 300).

I've tried this a number of different ways, I'm using JetBrains's DataGrip (as well as MySQLWorkbench) and that works like MySQLWorkbench (sends through the @ parameters), I've tried executing the queries and the sproc from MySQLWorkbench, DataGrip, Java (JDBC) and .Net. I've also tried prepared statements in Java but I can't get anywhere near the performance of sending the 'raw' SQL to MySQL.

I feel like I'm missing something obvious here but I don't know what it is.

The query is relatively complex, it has a CTE a couple of sub-selects and a couple of joins, but as I said it runs quickly straight from MySQL.

My main question is why the query is 20x faster in one format than another. Does the way the query is sent to MySQL have anything to do with this (the '@' values sent through and can I replicate this in a stored procedure?

Updated 1st Jan

Thanks for the comments, I didn't post the query originally as I'm more interested in the general concepts around the use of variables/parameters and how I could take advantage of that (or not)

Here is the original query:

with tmp_bat as (select bd.MatchId,
                        bd.matchtype,
                        bd.playerid,
                        bd.teamid,
                        bd.opponentsid,
                        bd.inningsnumber,
                        bd.dismissal,
                        bd.dismissaltype,
                        bd.bowlerid,
                        bd.fielderid,
                        bd.score,
                        bd.position,
                        bd.notout,
                        bd.balls,
                        bd.minutes,
                        bd.fours,
                        bd.sixes,
                        bd.hundred,
                        bd.fifty,
                        bd.duck,
                        bd.captain,
                        bd.wicketkeeper,
                        m.hometeamid,
                        m.awayteamid,
                        m.matchdesignator,
                        m.matchtitle,
                        m.location,
                        m.tossteamid,
                        m.resultstring,
                        m.whowonid,
                        m.howmuch,
                        m.victorytype,
                        m.duration,
                        m.ballsperover,
                        m.daynight,
                        m.LocationId
                 from (select *
                       from battingdetails
                       where matchid in
                             (select id
                              from matches
                              where id in (select matchid from battingdetails)
                                and matchtype = @match_type
                             )) as bd
                          join matches m on m.id = bd.matchid
                          join extramatchdetails emd1
                               on emd1.MatchId = m.Id
                                   and emd1.TeamId = bd.TeamId
                          join extramatchdetails emd2
                               on emd2.MatchId = m.Id
                                   and emd2.TeamId = bd.TeamId
)
select players.fullname                                    name,
       teams.teams                                         team,
       ''                                                  opponents,
       players.sortnamepart,
       innings.matches,
       innings.innings,
       innings.notouts,
       innings.runs,
       HS.score                                            highestscore,
       HS.NotOut,
       CAST(TRUNCATE(innings.runs / (CAST((Innings.Innings - innings.notOuts) AS DECIMAL)),
                     2) AS DECIMAL(7, 2))                  'Avg',
       innings.hundreds,
       innings.fifties,
       innings.ducks,
       innings.fours,
       innings.sixes,
       innings.balls,
       CONCAT(grounds.CountryName, ' - ', grounds.KnownAs) Ground,
       ''                                                  Year,
       ''                                                  CountryName
from (select count(case when inningsnumber = 1 then 1 end)                           matches,
             count(case when dismissaltype != 11 and dismissaltype != 14 then 1 end) innings,
             LocationId,
             playerid,
             MatchType,
             SUM(score)                                                              runs,
             SUM(notout)                                                             notouts,
             SUM(hundred)                                                            Hundreds,
             SUM(fifty)                                                              Fifties,
             SUM(duck)                                                               Ducks,
             SUM(fours)                                                              Fours,
             SUM(sixes)                                                              Sixes,
             SUM(balls)                                                              Balls
      from tmp_bat
      group by MatchType, playerid, LocationId) as innings
         JOIN players ON players.id = innings.playerid
         join grounds on Grounds.GroundId = LocationId and grounds.MatchType = innings.MatchType
         join
     (select pt.playerid, t.matchtype, GROUP_CONCAT(t.name SEPARATOR ', ') as teams
      from playersteams pt
               join teams t on pt.teamid = t.id
      group by pt.playerid, t.matchtype)
         as teams on teams.playerid = innings.playerid and teams.matchtype = innings.MatchType
         JOIN
     (SELECT playerid,
             LocationId,
             MAX(Score)  Score,
             MAX(NotOut) NotOut
      FROM (SELECT battingdetails.playerid,
                   battingdetails.score,
                   battingdetails.notout,
                   battingdetails.LocationId
            FROM tmp_bat as battingdetails
                     JOIN (SELECT battingdetails.playerid,
                                  battingdetails.LocationId,
                                  MAX(battingdetails.Score) AS score
                           FROM tmp_bat as battingdetails
                           GROUP BY battingdetails.playerid,
                                    battingdetails.LocationId,
                                    battingdetails.playerid) AS maxscore
                          ON battingdetails.score = maxscore.score
                              AND battingdetails.playerid = maxscore.playerid
                              AND battingdetails.LocationId = maxscore.LocationId                              ) AS internal
      GROUP BY internal.playerid, internal.LocationId) AS HS
     ON HS.playerid = innings.playerid and hs.LocationId = innings.LocationId

where innings.runs >= @runs_limit

order by runs desc, KnownAs, SortNamePart

limit 0, 300;

Wherever you see '@match_type' then I substitute that for a value ('t'). This query takes ~1.1 secs to run. The query with the hard coded values rather than the variables down to ~3.5 secs (see the other note below). The EXPLAIN for this query gives this:

1,PRIMARY,<derived7>,,ALL,,,,,219291,100,Using temporary; Using filesort
1,PRIMARY,players,,eq_ref,PRIMARY,PRIMARY,4,teams.playerid,1,100,
1,PRIMARY,<derived2>,,ref,<auto_key3>,<auto_key3>,26,"teams.playerid,teams.matchtype",11,100,Using where
1,PRIMARY,grounds,,ref,GroundId,GroundId,4,innings.LocationId,1,10,Using where
1,PRIMARY,<derived8>,,ref,<auto_key0>,<auto_key0>,8,"teams.playerid,innings.LocationId",169,100,
8,DERIVED,<derived3>,,ALL,,,,,349893,100,Using temporary
8,DERIVED,<derived14>,,ref,<auto_key0>,<auto_key0>,13,"battingdetails.PlayerId,battingdetails.LocationId,battingdetails.Score",10,100,Using index
14,DERIVED,<derived3>,,ALL,,,,,349893,100,Using temporary
7,DERIVED,t,,ALL,PRIMARY,,,,3323,100,Using temporary; Using filesort
7,DERIVED,pt,,ref,TeamId,TeamId,4,t.Id,65,100,
2,DERIVED,<derived3>,,ALL,,,,,349893,100,Using temporary
3,DERIVED,matches,,ALL,PRIMARY,,,,114162,10,Using where
3,DERIVED,m,,eq_ref,PRIMARY,PRIMARY,4,matches.Id,1,100,
3,DERIVED,emd1,,ref,"PRIMARY,TeamId",PRIMARY,4,matches.Id,1,100,Using index
3,DERIVED,emd2,,eq_ref,"PRIMARY,TeamId",PRIMARY,8,"matches.Id,emd1.TeamId",1,100,Using index
3,DERIVED,battingdetails,,ref,"TeamId,MatchId,match_team",match_team,8,"emd1.TeamId,matches.Id",15,100,
3,DERIVED,battingdetails,,ref,MatchId,MatchId,4,matches.Id,31,100,Using index; FirstMatch(battingdetails)

and the EXPLAIN for the query with the hardcoded values looks like this:

1,PRIMARY,<derived8>,,ALL,,,,,20097,100,Using temporary; Using filesort
1,PRIMARY,players,,eq_ref,PRIMARY,PRIMARY,4,HS.PlayerId,1,100,
1,PRIMARY,grounds,,ref,GroundId,GroundId,4,HS.LocationId,1,100,Using where
1,PRIMARY,<derived2>,,ref,<auto_key0>,<auto_key0>,30,"HS.LocationId,HS.PlayerId,grounds.MatchType",17,100,Using where
1,PRIMARY,<derived7>,,ref,<auto_key0>,<auto_key0>,46,"HS.PlayerId,innings.MatchType",10,100,Using where
8,DERIVED,matches,,ALL,PRIMARY,,,,114162,10,Using where; Using temporary
8,DERIVED,m,,eq_ref,"PRIMARY,LocationId",PRIMARY,4,matches.Id,1,100,
8,DERIVED,emd1,,ref,"PRIMARY,TeamId",PRIMARY,4,matches.Id,1,100,Using index
8,DERIVED,emd2,,eq_ref,"PRIMARY,TeamId",PRIMARY,8,"matches.Id,emd1.TeamId",1,100,Using index
8,DERIVED,<derived14>,,ref,<auto_key2>,<auto_key2>,4,m.LocationId,17,100,
8,DERIVED,battingdetails,,ref,"PlayerId,TeamId,Score,MatchId,match_team",MatchId,8,"matches.Id,maxscore.PlayerId",1,3.56,Using where
8,DERIVED,battingdetails,,ref,MatchId,MatchId,4,matches.Id,31,100,Using index; FirstMatch(battingdetails)
14,DERIVED,matches,,ALL,PRIMARY,,,,114162,10,Using where; Using temporary
14,DERIVED,m,,eq_ref,PRIMARY,PRIMARY,4,matches.Id,1,100,
14,DERIVED,emd1,,ref,"PRIMARY,TeamId",PRIMARY,4,matches.Id,1,100,Using index
14,DERIVED,emd2,,eq_ref,"PRIMARY,TeamId",PRIMARY,8,"matches.Id,emd1.TeamId",1,100,Using index
14,DERIVED,battingdetails,,ref,"TeamId,MatchId,match_team",match_team,8,"emd1.TeamId,matches.Id",15,100,
14,DERIVED,battingdetails,,ref,MatchId,MatchId,4,matches.Id,31,100,Using index; FirstMatch(battingdetails)
7,DERIVED,t,,ALL,PRIMARY,,,,3323,100,Using temporary; Using filesort
7,DERIVED,pt,,ref,TeamId,TeamId,4,t.Id,65,100,
2,DERIVED,matches,,ALL,PRIMARY,,,,114162,10,Using where; Using temporary
2,DERIVED,m,,eq_ref,PRIMARY,PRIMARY,4,matches.Id,1,100,
2,DERIVED,emd1,,ref,"PRIMARY,TeamId",PRIMARY,4,matches.Id,1,100,Using index
2,DERIVED,emd2,,eq_ref,"PRIMARY,TeamId",PRIMARY,8,"matches.Id,emd1.TeamId",1,100,Using index
2,DERIVED,battingdetails,,ref,"TeamId,MatchId,match_team",match_team,8,"emd1.TeamId,matches.Id",15,100,
2,DERIVED,battingdetails,,ref,MatchId,MatchId,4,matches.Id,31,100,Using index; FirstMatch(battingdetails)

Pointers as to ways to improve my SQL are always welcome (I'm definitely not a database person), but I''d still like to understand whether I can use the SQL with the variables from code and why that improves the performance by so much

Update 2 1st Jan

AAArrrggghhh. My machine rebooted overnight and now the queries are generally running much quicker. It's still 1 sec vs 3 secs but the 20 times slowdown does seem to have disappeared


Solution

  • Not sure this is the correct answer but I thought I'd post this in case other people have the same issue.

    The issue seems to be the use of CTEs in a stored procedure. I have a query that creates a CTE and then uses that CTE 8 times. If I run this query using interpolated variables it takes about 0.8 sec, if I turn it into a stored procedure and use the stored procedure parameters then it takes about to a minute (between 45 and 63 seconds) to run!

    I've found a couple of ways of fixing this, one is to use multiple temporary tables (8 in this case) as MySQL cannot re-use a temp table in a query. This gets the query time right down but just doesn't fell like a maintainable or scalable solution. The other fix is to leave the variables in place and assign them from the stored procedure parameters, this also has no real performance issues. So my sproc looks like this:

    create procedure bowling_individual_career_records_by_year_for_team_vs_opponent(IN team_id INT,
                                                                                    IN opponents_id INT)
    begin
    
        set @team_id = team_id;
        set @opponents_id = opponents_id;
    
        # use these variables in the SQL below
    
        ...
    
    
    end
    

    Not sure this is the best solution but it works for me and keeps the structure of the SQL the same as it was previously.