I'm not an expert in data-warehousing nor analytics, so I give birth to a monster-query that I'd like to optimise (if possible).
The problem is: I need to display the stagings table for a given tournament. This table should display team id, total score, team position (i.e. bucket for same-score teams), order and latest (!) win-lose strike.
I.e. for the following sequence (most recent first) WWLLLWLD
I should get 2W
W
= win, L
= lose, D
= draft
Schema
create table matches (
id integer primary key,
stage_id integer not null,
scheduled_at timestamp not null,
winner_id integer null,
status text not null default 'finished' -- just to give extra context
);
create table teams (
id integer primary key
);
create table match_teams (
match_id integer,
team_id integer,
constraint fk_mt_m foreign key (match_id) references matches(id),
constraint fk_mt_t foreign key (team_id) references teams(id)
);
insert into teams(id) values(1),(2);
insert into matches(id, stage_id, scheduled_at, winner_id) values
(1, 1, now() - interval '1 day', 1),
(2, 1, now() - interval '2 days', 1),
(3, 1, now() - interval '3 days', 2),
(4, 1, now() - interval '4 days', 1),
(5, 1, now() - interval '5 days', null);
insert into match_teams(match_id, team_id) values
(1, 1),
(1, 2),
(2, 1),
(2, 2),
(3, 1),
(3, 2),
(4, 1),
(4, 2),
(5, 1),
(5, 2);
Query itself:
with v_mto as (
SELECT
m.id,
m."stage_id",
mt."team_id",
m."scheduled_at",
(
case
when m."winner_id" IS NULL then 0
when (m."winner_id" = mt."team_id") then 1
else -1
end
) win
FROM matches m
INNER JOIN match_teams mt ON m.id = mt."match_id"
WHERE m.status = 'finished'
ORDER BY "stage_id", "team_id", "scheduled_at" desc
),
v_lag as (
select
"stage_id",
"team_id",
win,
lag(win, 1, win) over (partition by "stage_id", "team_id" order by "scheduled_at" desc ) lag_win,
first_value(win) over (partition by "stage_id", "team_id" order by "scheduled_at" desc ) first_win
from v_mto
)
select
"stage_id",
"team_id",
v_lag.win,
count(1)
from v_lag
where v_lag.win = v_lag.lag_win and v_lag.win = v_lag.first_win
group by 1, 2, 3
-- This is the query for the final table (on a screenshot)
-- with team_scores as (
-- select
-- m."tournamentStageId",
-- "teamId",
-- sum(
-- -- each win gives 3 score, each draft gives 1 score
-- coalesce((m."winner_id" = mt."team_id")::integer, 0) * 3
-- +
-- (m."winner_id" IS NULL)::int
-- ) as score
-- from matches m
-- inner join match_teams mt on m.id = mt."matchId"
-- where m.status = 1
-- group by m."tournamentStageId", "teamId")
-- select
-- "tournamentStageId",
-- "teamId",
-- t.name,
-- score,
-- dense_rank() over (partition by "tournamentStageId" order by score desc) rank,
-- row_number() over (partition by "tournamentStageId" order by t.name) position
-- -- total number of wins/losses/drafts to be added (the "score" column from the screenshot)
-- from team_scores ts
-- inner join teams t on t.id = ts."teamId"
-- order by "tournamentStageId", rank, position
I've created a sandbox for those who is brave enough to get a deep dive into the task: https://www.db-fiddle.com/f/6jsFFnxQMKwNQWznR3VXHC/2
Also, I've already crafted the part that creates a list of teams together with scores and points, so the attached query will be used as a joined one or sub-select.
Query plan on the real database and query (some indexes, probably, are missing, but that's ok for this moment):
GroupAggregate (cost=24862.28..29423.68 rows=3 width=24)
" Group Key: v_lag.""computerGameId"", v_lag.""tournamentStageId"", v_lag.""teamId"", v_lag.win"
-> Incremental Sort (cost=24862.28..29423.61 rows=3 width=16)
" Sort Key: v_lag.""computerGameId"", v_lag.""tournamentStageId"", v_lag.""teamId"", v_lag.win"
" Presorted Key: v_lag.""computerGameId"", v_lag.""tournamentStageId"", v_lag.""teamId"""
-> Subquery Scan on v_lag (cost=22581.67..29423.47 rows=3 width=16)
Filter: ((v_lag.win = v_lag.lag_win) AND (v_lag.lag_win = v_lag.first_win))
-> WindowAgg (cost=22581.67..27468.67 rows=130320 width=32)
-> Subquery Scan on v_mto (cost=22581.67..24210.67 rows=130320 width=24)
-> Sort (cost=22581.67..22907.47 rows=130320 width=28)
" Sort Key: m.""computerGameId"", m.""tournamentStageId"", mt.""teamId"", m.""scheduledAt"" DESC"
-> Hash Join (cost=3863.39..8391.38 rows=130320 width=28)
" Hash Cond: (mt.""matchId"" = m.id)"
-> Seq Scan on match_teams mt (cost=0.00..2382.81 rows=137281 width=8)
-> Hash (cost=2658.10..2658.10 rows=65623 width=24)
-> Seq Scan on matches m (cost=0.00..2658.10 rows=65623 width=24)
Filter: (status = 1)
Thanks everyone for help and suggestions!
P.S. it is possible to convert the first query (v_mto
) as materialised view or de-normalise win
into the match_teams
table, as this piece will be used in different queries to build match/game stats.
So, the original query is wrong - gives incorrect result for the standings results.
I've moved to row_number
math to solve this task.
The final query (with scores) looks like this:
create materialized view vm_tournament_stage_standings as
with v_mto as (SELECT m.id,
m."computerGameId",
m."tournamentStageId",
mt."teamId",
m."scheduledAt",
(
case
when m."winnerId" IS NULL then 'D'
when m."winnerId" = mt."teamId" then 'W'
else 'L'
end
) win
FROM matches m
INNER JOIN match_teams mt ON
m.id = mt."matchId"
WHERE m.status = 1),
v_streaks as (select "computerGameId",
"tournamentStageId",
"teamId",
row_number()
over grp_ord_matches
- row_number()
over (partition by "computerGameId", "tournamentStageId", "teamId", win order by "scheduledAt" desc ) streak_index,
win
from v_mto
window grp_ord_matches as (partition by "computerGameId", "tournamentStageId", "teamId" order by "scheduledAt" desc)),
v_streak as (select "computerGameId",
"tournamentStageId",
"teamId",
count(1) || win as streak
from v_streaks
where streak_index = 0
group by "computerGameId", "tournamentStageId", "teamId", "win"),
team_scores as (select m."tournamentStageId",
"teamId",
sum((m."winnerId" = mt."teamId")::int) as wins,
sum((m."winnerId" is null)::int) draws,
sum((m."winnerId" <> mt."teamId")::int) loses,
sum(
coalesce((m."winnerId" = mt."teamId")::integer, 0) * 3
+
(m."winnerId" IS NULL)::int
) as score
from matches m
inner join match_teams mt on m.id = mt."matchId"
where m.status = 1
group by m."tournamentStageId", "teamId")
select ts."tournamentStageId",
ts."teamId",
score,
wins,
draws,
loses,
vs.streak as streak,
dense_rank() over (partition by ts."tournamentStageId" order by score desc) rank,
row_number() over (partition by ts."tournamentStageId" order by t.name) position
from team_scores ts
inner join teams t on t.id = ts."teamId"
inner join v_streak vs on vs."teamId" = t.id and vs."tournamentStageId" = ts."tournamentStageId"
order by "tournamentStageId", rank, position