Search code examples
mysqlsqlquery-performancecorrelated-subquery

how to increase performance of summing a column


I am trying to tune my query but I can't go further. is there any chance to tune more this query? especially SUM sub query.

INDEX : db_prices.date

Example of original query:

SELECT
db_villas.id,
db_villas.title1,
db_specials.id AS sid,
db_specials.title1 AS stitle,
db_cities.name AS cityName,
db_counties.name AS countyName,
db_assets.path,
db_villas.bathroom,
db_villas.bedroom,
db_villas.guest,
db_prices.date,
(SELECT SUM(db_prices.price) FROM db_prices WHERE db_prices.date BETWEEN "2016-08-01" AND "2016-09-30" AND db_prices.villaId=db_villas.id) AS price
FROM
db_villas
INNER JOIN db_cities ON db_villas.cityId = db_cities.id
LEFT OUTER JOIN db_specials ON db_villas.specialId = db_specials.id
INNER JOIN db_counties ON db_counties.cityid = db_cities.id AND db_villas.countyId = db_counties.id
INNER JOIN db_assets ON db_assets.guid = db_villas.guid
INNER JOIN db_villafacilities ON db_villafacilities.villaId = db_villas.id
INNER JOIN db_prices ON db_prices.villaId = db_villas.id
WHERE db_prices.date BETWEEN "2016-08-01" AND "2016-09-30" AND db_assets.isMainImage=1 AND db_villas.minRent <= 7
GROUP BY db_villas.id
HAVING (SELECT COUNT(*) FROM db_prices WHERE date BETWEEN "2016-08-01" AND "2016-09-30" AND db_prices.isFree = 0 AND db_prices.villaId = db_villas.id)=0

the query at above executed in 1.2 seconds.

When I remove

(SELECT SUM(db_prices.price) FROM db_prices WHERE db_prices.date BETWEEN "2016-08-01" AND "2016-09-30" AND db_prices.villaId=db_villas.id) AS price

sub query execution time reduced up to 0.009 seconds.

If I just remove this part

AND db_prices.villaId=db_villas.id

from the sub query it's still executed in 0.009 seconds.


Solution

  • MySQL (as of v 5.7) has a query planner without the chops to transform your dependent subquery

     (SELECT SUM(db_prices.price) 
        FROM db_prices
       WHERE db_prices.date BETWEEN "2016-08-01" AND "2016-09-30"
         AND db_prices.villaId=db_villas.id) AS price
    

    into a joinable independent subquery. That means the query planner ends up running that query many times, using up time. So you need to transform it yourself. The independent subquery will look like this:

                     SELECT villaId, 
                            SUM(price) price,  
                            SUM(CASE WHEN isFree = 0 THEN 1 ELSE 0 END) not_free_count
                       FROM db_prices
                      WHERE date BETWEEN  '2016-08-01' AND '2016-09-30'
                      GROUP BY villaId
    

    This query gives you the sum of prices, and the count of villas not tagged isFree, for each villa. This is handy, because you can now JOIN this to the rest of your table. Like so:

     SELECT db_villas.id,
            db_villas.title1, etc etc,
            price_summary.price
       FROM db_villas
      INNER JOIN db_cities ON db_villas.cityId = db_cities.id
       LEFT OUTER JOIN db_specials ON db_villas.specialId = db_specials.id
               etc etc.
       LEFT JOIN (
                     SELECT villaId, 
                            SUM(price) price,  
                            SUM(CASE WHEN isFree = 0 THEN 1 ELSE 0 END) not_free_count
                       FROM db_prices
                      WHERE date BETWEEN  '2016-08-01' AND '2016-09-30'
                      GROUP BY villaId
           ) price_summary ON db_villas.villaId = price_summmary.villaId
     WHERE db_prices.date BETWEEN "2016-08-01" AND "2016-09-30"
       AND etc etc
       AND price_summary.not_free_count = 0
     GROUP BY db_villas.villaId
    

    Then, you need the compound index on db_prices (date, villaId, price, isFree) to optimize your subquery. You may also need indexes on some other columns of other tables.

    Pro tip: Lots of single-column indexes are no substitute for compound indexes in speeding up queries. Indexing lots of columns individually is a common, and notorious, antipattern. Read this: http://use-the-index-luke.com/

    Pro tip: Your query is using the non-standard MySQL extension to GROUP BY. In a version of MySQL you may have soon, this will stop working unless you change some server settings. Read this: https://dev.mysql.com/doc/refman/5.7/en/group-by-handling.html