I'm trying to return all the country that have football matches
which play in a specific date
. The data are defined in the following tables:
competition
id | country_id | name
50 1 Premier League
competition_seasons
id | competition_id | name
70 50 2019
competition_rounds
id | season_id | name
58 70 Regular Season
match
id | round_id | home | away | result | datetime
44 58 22 87 1 - 0 2019-03-16:00:00
There are different competitions stored in the competition
table, and then each competition can have multiple season
which are stored in the competition_seasons
. A season
can also have different competition rounds
, these are stored in competition_rounds
.
All the matches
are stored in the match
table and are grouped for the round_id
.
I wrote this method for the API:
$app->get('/country/get_countries/{date}', function (Request $request, Response $response, array $args)
{
$start_date = $args["date"] . " 00:00";
$end_date = $args["date"] . " 23:59";
$sql = $this->db->query("SELECT n.* FROM country n
LEFT JOIN competition c ON c.country_id = n.id
LEFT JOIN competition_seasons s ON s.competition_id = c.id
LEFT JOIN competition_rounds r ON r.season_id = s.id
LEFT JOIN `match` m ON m.round_id = r.id
WHERE m.datetime BETWEEN '" . $start_date . "' AND '" . $end_date . "'
GROUP BY n.id");
$sql->execute();
$countries = $sql->fetchAll();
return $response->withJson($countries);
});
there are thousands of records organized by id, but the query took about 6, 7 seconds to return all the countries
that play in the specified date.
How can I optimize this process?
Performance
UPDATE
I noticed an interesting thing, if I do:
SELECT round_id, DATE("2019-03-18") FROM `match`
the query is really fast, so I guess the datetime
field is slow down the join part, any idea about that?
Table Structure
CREATE TABLE IF NOT EXISTS `swp`.`competition` (
`id` INT NOT NULL,
`country_id` INT NULL,
`name` VARCHAR(255) NULL,
`category` INT NULL,
PRIMARY KEY (`id`),
INDEX `id_idx` (`country_id` ASC),
INDEX `FK_competition_types_competition_type_id_idx` (`category` ASC),
CONSTRAINT `FK_country_competition_country_id`
FOREIGN KEY (`country_id`)
REFERENCES `swp`.`country` (`id`)
ON DELETE NO ACTION
ON UPDATE NO ACTION,
CONSTRAINT `FK_competition_categories_competition_category_id`
FOREIGN KEY (`category`)
REFERENCES `swp`.`competition_categories` (`id`)
ON DELETE NO ACTION
ON UPDATE NO ACTION)
ENGINE = InnoDB;
CREATE TABLE IF NOT EXISTS `swp`.`competition_seasons` (
`id` INT NOT NULL AUTO_INCREMENT,
`competition_id` INT NOT NULL,
`season_id` INT NULL,
`name` VARCHAR(45) NOT NULL,
`update_at` DATETIME NULL,
PRIMARY KEY (`id`),
INDEX `FK_competition_competition_seasons_competition_id_idx` (`competition_id` ASC),
CONSTRAINT `FK_competition_competition_seasons_competition_id`
FOREIGN KEY (`competition_id`)
REFERENCES `swp`.`competition` (`id`)
ON DELETE NO ACTION
ON UPDATE NO ACTION)
ENGINE = InnoDB;
CREATE TABLE IF NOT EXISTS `swp`.`competition_rounds` (
`id` INT NOT NULL AUTO_INCREMENT,
`round_id` INT NULL,
`season_id` INT NOT NULL,
`name` VARCHAR(255) NULL,
PRIMARY KEY (`id`),
INDEX `FK_competition_seasons_competition_rounds_season_id_idx` (`season_id` ASC),
CONSTRAINT `FK_competition_seasons_competition_rounds_season_id`
FOREIGN KEY (`season_id`)
REFERENCES `swp`.`competition_seasons` (`id`)
ON DELETE NO ACTION
ON UPDATE NO ACTION)
ENGINE = InnoDB;
-- -----------------------------------------------------
-- Table `swp`.`match`
-- -----------------------------------------------------
CREATE TABLE IF NOT EXISTS `swp`.`match` (
`id` INT NOT NULL,
`round_id` INT NOT NULL,
`group_id` INT NULL,
`datetime` DATETIME NULL,
`status` INT NULL,
`gameweek` INT NULL,
`home_team_id` INT NULL,
`home_team_half_time_score` INT NULL,
`home_team_score` INT NULL,
`home_extra_time` INT NULL,
`home_penalties` INT NULL,
`away_team_id` INT NULL,
`away_team_half_time_score` INT NULL,
`away_team_score` INT NULL,
`away_extra_time` INT NULL,
`away_penalties` INT NULL,
`venue_id` INT NULL,
`venue_attendance` INT NULL,
`aggregate_match_id` INT NULL,
PRIMARY KEY (`id`),
INDEX `home_team_id_idx` (`home_team_id` ASC),
INDEX `away_team_id_idx` (`away_team_id` ASC),
INDEX `venue_id_idx` (`venue_id` ASC),
INDEX `match_status_id_idx` (`status` ASC),
INDEX `FK_competition_rounds_match_round_id_idx` (`round_id` ASC),
INDEX `FK_match_match_aggregate_match_id_idx` (`aggregate_match_id` ASC),
INDEX `FK_competition_groups_match_group_id_idx` (`group_id` ASC),
CONSTRAINT `FK_team_match_home_team_id`
FOREIGN KEY (`home_team_id`)
REFERENCES `swp`.`team` (`id`)
ON DELETE NO ACTION
ON UPDATE NO ACTION,
CONSTRAINT `FK_team_match_away_team_id`
FOREIGN KEY (`away_team_id`)
REFERENCES `swp`.`team` (`id`)
ON DELETE NO ACTION
ON UPDATE NO ACTION,
CONSTRAINT `FK_venue_match_venue_id`
FOREIGN KEY (`venue_id`)
REFERENCES `swp`.`venue` (`id`)
ON DELETE NO ACTION
ON UPDATE NO ACTION,
CONSTRAINT `FK_match_status_match_status_id`
FOREIGN KEY (`status`)
REFERENCES `swp`.`match_status` (`id`)
ON DELETE NO ACTION
ON UPDATE NO ACTION,
CONSTRAINT `FK_competition_rounds_match_round_id`
FOREIGN KEY (`round_id`)
REFERENCES `swp`.`competition_rounds` (`id`)
ON DELETE NO ACTION
ON UPDATE NO ACTION,
CONSTRAINT `FK_match_match_aggregate_match_id`
FOREIGN KEY (`aggregate_match_id`)
REFERENCES `swp`.`match` (`id`)
ON DELETE NO ACTION
ON UPDATE NO ACTION,
CONSTRAINT `FK_competition_groups_match_group_id`
FOREIGN KEY (`group_id`)
REFERENCES `swp`.`competition_groups` (`id`)
ON DELETE NO ACTION
ON UPDATE NO ACTION)
ENGINE = InnoDB;
With LEFT JOIN
, the query can only be executed top-bottom, meaning the last table is scanned for every product of entries in the before tables. Also, using LEFT JOIN
and GROUP BY
without any aggregate makes no sense, because it will always return all country ids. This having said, I would rewrite it like this:
SELECT DISTINCT
c.country_id
FROM
competition c,
WHERE
EXISTS (
SELECT
*
FROM
competition_seasons s,
competition_rounds r,
`match` m
WHERE
s.competition_id = c.id
AND r.season_id = s.id
AND m.round_id = r.id
AND m.datetime BETWEEN ...
)
This will be correctly optimized by all RDB's I know of.
Note, an 2-column index on (match.datetime, match.round_id)
- in this order, will make a huge performance impact. Or is write speed is a concern, at least a single column index on (match.datetime)
would be recommended.
Important note about indexes on strings: String comparison is always quirky in RDBs. Make sure you use a binary collation for the datetime column or use native DATETIME format. Various RDBs may fail to use indexes on case-insensitive columns.
Note I removed the join on n - that just add another PK lookup to check that the country still exists in the countries table. You can add it back in if you don't have any ON DELETE CASCADE or other kind of constraint that ensures data consistency, like this:
SELECT DISTINCT
n.id
FROM
country n
WHERE
EXISTS (
SELECT
*
FROM
competition c,
competition_seasons s,
competition_rounds r,
`match` m
WHERE
c.country_id=n.id
AND s.competition_id = c.id
AND r.season_id = s.id
AND m.round_id = r.id
AND m.datetime BETWEEN ...
)