Search code examples
mysqlsqlpdoslim

Query with three join incredibly slow


I'm trying to return all the country that have football matches which play in a specific date. The data are defined in the following tables:

competition

id | country_id | name 
50       1         Premier League

competition_seasons

id | competition_id | name
 70       50          2019

competition_rounds

id | season_id | name 
 58       70      Regular Season

match

id | round_id | home | away | result | datetime
 44      58       22     87     1 - 0  2019-03-16:00:00

There are different competitions stored in the competition table, and then each competition can have multiple season which are stored in the competition_seasons. A season can also have different competition rounds, these are stored in competition_rounds.

All the matches are stored in the match table and are grouped for the round_id.

I wrote this method for the API:

$app->get('/country/get_countries/{date}', function (Request $request, Response $response, array $args)
{
  $start_date = $args["date"] . " 00:00";
  $end_date = $args["date"] . " 23:59";

  $sql = $this->db->query("SELECT n.* FROM country n
    LEFT JOIN competition c ON c.country_id = n.id
    LEFT JOIN competition_seasons s ON s.competition_id = c.id
    LEFT JOIN competition_rounds r ON r.season_id = s.id
    LEFT JOIN `match` m ON m.round_id = r.id
    WHERE m.datetime BETWEEN '" . $start_date . "' AND '" . $end_date . "'
    GROUP BY n.id");

  $sql->execute();
  $countries = $sql->fetchAll();
  return $response->withJson($countries);
});

there are thousands of records organized by id, but the query took about 6, 7 seconds to return all the countries that play in the specified date.

How can I optimize this process?

Performance

enter image description here

UPDATE

I noticed an interesting thing, if I do:

SELECT round_id, DATE("2019-03-18") FROM `match`

the query is really fast, so I guess the datetime field is slow down the join part, any idea about that?

Table Structure

CREATE TABLE IF NOT EXISTS `swp`.`competition` (
  `id` INT NOT NULL,
  `country_id` INT NULL,
  `name` VARCHAR(255) NULL,
  `category` INT NULL,
  PRIMARY KEY (`id`),
  INDEX `id_idx` (`country_id` ASC),
  INDEX `FK_competition_types_competition_type_id_idx` (`category` ASC),
  CONSTRAINT `FK_country_competition_country_id`
    FOREIGN KEY (`country_id`)
    REFERENCES `swp`.`country` (`id`)
    ON DELETE NO ACTION
    ON UPDATE NO ACTION,
  CONSTRAINT `FK_competition_categories_competition_category_id`
    FOREIGN KEY (`category`)
    REFERENCES `swp`.`competition_categories` (`id`)
    ON DELETE NO ACTION
    ON UPDATE NO ACTION)
ENGINE = InnoDB;


CREATE TABLE IF NOT EXISTS `swp`.`competition_seasons` (
  `id` INT NOT NULL AUTO_INCREMENT,
  `competition_id` INT NOT NULL,
  `season_id` INT NULL,
  `name` VARCHAR(45) NOT NULL,
  `update_at` DATETIME NULL,
  PRIMARY KEY (`id`),
  INDEX `FK_competition_competition_seasons_competition_id_idx` (`competition_id` ASC),
  CONSTRAINT `FK_competition_competition_seasons_competition_id`
    FOREIGN KEY (`competition_id`)
    REFERENCES `swp`.`competition` (`id`)
    ON DELETE NO ACTION
    ON UPDATE NO ACTION)
ENGINE = InnoDB;

CREATE TABLE IF NOT EXISTS `swp`.`competition_rounds` (
  `id` INT NOT NULL AUTO_INCREMENT,
  `round_id` INT NULL,
  `season_id` INT NOT NULL,
  `name` VARCHAR(255) NULL,
  PRIMARY KEY (`id`),
  INDEX `FK_competition_seasons_competition_rounds_season_id_idx` (`season_id` ASC),
  CONSTRAINT `FK_competition_seasons_competition_rounds_season_id`
    FOREIGN KEY (`season_id`)
    REFERENCES `swp`.`competition_seasons` (`id`)
    ON DELETE NO ACTION
    ON UPDATE NO ACTION)
ENGINE = InnoDB;

-- -----------------------------------------------------
-- Table `swp`.`match`
-- -----------------------------------------------------
CREATE TABLE IF NOT EXISTS `swp`.`match` (
  `id` INT NOT NULL,
  `round_id` INT NOT NULL,
  `group_id` INT NULL,
  `datetime` DATETIME NULL,
  `status` INT NULL,
  `gameweek` INT NULL,
  `home_team_id` INT NULL,
  `home_team_half_time_score` INT NULL,
  `home_team_score` INT NULL,
  `home_extra_time` INT NULL,
  `home_penalties` INT NULL,
  `away_team_id` INT NULL,
  `away_team_half_time_score` INT NULL,
  `away_team_score` INT NULL,
  `away_extra_time` INT NULL,
  `away_penalties` INT NULL,
  `venue_id` INT NULL,
  `venue_attendance` INT NULL,
  `aggregate_match_id` INT NULL,
  PRIMARY KEY (`id`),
  INDEX `home_team_id_idx` (`home_team_id` ASC),
  INDEX `away_team_id_idx` (`away_team_id` ASC),
  INDEX `venue_id_idx` (`venue_id` ASC),
  INDEX `match_status_id_idx` (`status` ASC),
  INDEX `FK_competition_rounds_match_round_id_idx` (`round_id` ASC),
  INDEX `FK_match_match_aggregate_match_id_idx` (`aggregate_match_id` ASC),
  INDEX `FK_competition_groups_match_group_id_idx` (`group_id` ASC),
  CONSTRAINT `FK_team_match_home_team_id`
    FOREIGN KEY (`home_team_id`)
    REFERENCES `swp`.`team` (`id`)
    ON DELETE NO ACTION
    ON UPDATE NO ACTION,
  CONSTRAINT `FK_team_match_away_team_id`
    FOREIGN KEY (`away_team_id`)
    REFERENCES `swp`.`team` (`id`)
    ON DELETE NO ACTION
    ON UPDATE NO ACTION,
  CONSTRAINT `FK_venue_match_venue_id`
    FOREIGN KEY (`venue_id`)
    REFERENCES `swp`.`venue` (`id`)
    ON DELETE NO ACTION
    ON UPDATE NO ACTION,
  CONSTRAINT `FK_match_status_match_status_id`
    FOREIGN KEY (`status`)
    REFERENCES `swp`.`match_status` (`id`)
    ON DELETE NO ACTION
    ON UPDATE NO ACTION,
  CONSTRAINT `FK_competition_rounds_match_round_id`
    FOREIGN KEY (`round_id`)
    REFERENCES `swp`.`competition_rounds` (`id`)
    ON DELETE NO ACTION
    ON UPDATE NO ACTION,
  CONSTRAINT `FK_match_match_aggregate_match_id`
    FOREIGN KEY (`aggregate_match_id`)
    REFERENCES `swp`.`match` (`id`)
    ON DELETE NO ACTION
    ON UPDATE NO ACTION,
  CONSTRAINT `FK_competition_groups_match_group_id`
    FOREIGN KEY (`group_id`)
    REFERENCES `swp`.`competition_groups` (`id`)
    ON DELETE NO ACTION
    ON UPDATE NO ACTION)
ENGINE = InnoDB;

Solution

  • With LEFT JOIN, the query can only be executed top-bottom, meaning the last table is scanned for every product of entries in the before tables. Also, using LEFT JOIN and GROUP BY without any aggregate makes no sense, because it will always return all country ids. This having said, I would rewrite it like this:

    SELECT DISTINCT
        c.country_id
    FROM 
        competition c,
    WHERE 
    
        EXISTS (
            SELECT 
                *
            FROM
                competition_seasons s,
                competition_rounds r,
                `match` m
            WHERE
                s.competition_id = c.id
                AND r.season_id = s.id
                AND m.round_id = r.id 
                AND m.datetime BETWEEN ...
        )
    

    This will be correctly optimized by all RDB's I know of. Note, an 2-column index on (match.datetime, match.round_id) - in this order, will make a huge performance impact. Or is write speed is a concern, at least a single column index on (match.datetime) would be recommended.

    Important note about indexes on strings: String comparison is always quirky in RDBs. Make sure you use a binary collation for the datetime column or use native DATETIME format. Various RDBs may fail to use indexes on case-insensitive columns.

    Note I removed the join on n - that just add another PK lookup to check that the country still exists in the countries table. You can add it back in if you don't have any ON DELETE CASCADE or other kind of constraint that ensures data consistency, like this:

    SELECT DISTINCT
        n.id
    FROM 
        country n
    WHERE 
    
        EXISTS (
            SELECT 
                *
            FROM
                competition c,
                competition_seasons s,
                competition_rounds r,
                `match` m
            WHERE
                c.country_id=n.id
                AND s.competition_id = c.id
                AND r.season_id = s.id
                AND m.round_id = r.id 
                AND m.datetime BETWEEN ...
        )