I have a table of 'games' where some of those games (but not all) are grouped into 'campaigns' by a campaign ID.
I am trying to write an SQL query which will get a dataset containing various information about the games, but in particular: if a given game is part of a campaign how many games are in that campaign in total (I have this working) and the index of that game within the campaign (e.g. the earliest game in a campaign is index '1', the next is '2' and so on).
I have achieved this, but the execution plan looks terrible and the obvious way to fix that doesn't work, but I get ahead of myself.
Here is the working query, with some extraneous stuff removed:
g1.`id` AS `game_id`,
(SELECT
COUNT(*)
FROM `games` g3
WHERE g3.`campaign` = g1.`campaign`
) AS `campaign_length`,
ca2.`ri` AS `campaign_index`,
ca1.`id` AS `campaign_id`, ca1.`name` AS `campaign_name`
FROM `games` g1
LEFT JOIN `campaigns` ca1 ON ca1.`id` = g1.`campaign`
LEFT JOIN (
SELECT
g4.`id` AS `id`,
ROW_NUMBER() OVER (
PARTITION BY g4.`campaign`
ORDER BY g4.`start` ASC) AS `ri`
FROM `games` g4
) AS ca2 ON ca2.`id` = g1.`id`
WHERE g1.`end` > CURRENT_TIMESTAMP()
AND g1.`gamemaster` = 25
ORDER BY g1.`start` ASC
;
The problem with this version is that for table g4 the execution plan lists a full table scan - which is fine at the moment as there's only a few hundred records, but long term will be terrible for performance, especially as this query (or ones very similar to it) will be executed on many different pages of my website. I believe this is happening because the ROW_NUMBER() function needs to number all the rows before the LEFT JOIN's ON statement can filter them down to the ones I actually need.
The obvious solution, which I have tried to no avail, is to add
WHERE g4.`campaign` = g1.`campaign`
after FROM `games` g4
;
that way ROW_NUMBER() would only need to number those records that have a chance of being returned in the dataset. However this does not work because g1.`campaign`
is not in scope.
I can do WHERE g4.`campaign` IS NOT NULL
which at least gets the execution plan down to a Index Conditional instead of a full table scan, but it will still not scale nicely as the number of games in campaigns grows with time.
I know that my "obvious solution" won't work because of the scope problem, but does anyone have a suggestion for how I can achieve what I'm trying to do without a terrible execution plan?
Based on your comments, the campaign_index
must be calculated before the WHERE
clause is applied. This means that calculation of the campaign_index
will always require a full table scan, as the WHERE
clause can't reduce the rows being computed over.
You can, however, use windowed functions rather than a self join and correlated sub-query...
WITH
games AS
(
SELECT
*,
COUNT(*)
OVER (
PARTITION BY `campaign`
)
AS `campaign_length`,
ROW_NUMBER()
OVER (
PARTITION BY `campaign`
ORDER BY `start`
)
AS `campaign_index`
FROM
games
)
SELECT
games.*,
campaigns.`name` AS `campaign_name`
FROM
games
LEFT JOIN
campaigns
ON campaigns.`id` = games.`campaign`
WHERE
games.`end` > CURRENT_TIMESTAMP()
AND games.`gamemaster` = 25
ORDER BY
games.`start`
;