Search code examples
mysqlsql-order-byinnodblimit

MySQL - COUNT and ORDER BY performance is very slow on large tables


The following query has an average duration of 10 seconds:

SELECT masters_genres.*, masters_artists.*, 
COUNT(masters_artists.master_id) as quantity FROM masters_genres 
JOIN masters_artists ON masters_genres.master_id = masters_artists.master_id 
WHERE masters_genres.genre='Electronic' GROUP BY masters_artists.artist_id 
ORDER BY quantity DESC LIMIT 25

The tables have 2 and 3 million records.

Table structure:

-- -----------------------------------------------------
-- Table `music_data`.`masters_artists`
-- -----------------------------------------------------
CREATE TABLE IF NOT EXISTS `music_data`.`masters_artists` (
  `master_id` INT(30) NOT NULL,
  `artist_id` INT(30) NOT NULL,
  `artist_name` VARCHAR(500) CHARACTER SET 'utf8' NOT NULL,
  INDEX `artist_id` (`artist_id` ASC),
  INDEX `fk_masters_artists_masters_idx` (`master_id` ASC),
  CONSTRAINT `fk_masters_artists_masters`
    FOREIGN KEY (`master_id`)
    REFERENCES `music_data`.`masters` (`master_id`)
    ON DELETE NO ACTION
    ON UPDATE NO ACTION)
ENGINE = InnoDB
DEFAULT CHARACTER SET = latin1;


-- -----------------------------------------------------
-- Table `music_data`.`masters_genres`
-- -----------------------------------------------------
CREATE TABLE IF NOT EXISTS `music_data`.`masters_genres` (
  `master_id` INT(30) NOT NULL,
  `genre` VARCHAR(255) CHARACTER SET 'utf8' NOT NULL,
  INDEX `genre` (`genre` ASC),
  INDEX `fk_masters_genres_masters1_idx` (`master_id` ASC),
  CONSTRAINT `fk_masters_genres_masters1`
    FOREIGN KEY (`master_id`)
    REFERENCES `music_data`.`masters` (`master_id`)
    ON DELETE NO ACTION
    ON UPDATE NO ACTION)
ENGINE = InnoDB
DEFAULT CHARACTER SET = latin1;

What can I do to speed up this query? I know the speed is going down because of ...

ORDER BY quantity DESC

but i don't know how to improve my query to get the right results.


Solution

  • I tried another solution and generated a helper table. I found out that it is very important to find the right order for the composite index: 1. position should be the "group by" column, in my case artist_id 2. position should be the column I used for the "count", here master_id Then the WHERE clause.

    In the beginning i used the same column combination for the index, but it was much slower. Now I am able to get the results after 1 second, a wonderful performance increase compared to the 10 seconds before.