Search code examples
postgresqlpostgresql-14

Optimise query: count latest win/lose streak for all teams


I'm not an expert in data-warehousing nor analytics, so I give birth to a monster-query that I'd like to optimise (if possible).

The problem is: I need to display the stagings table for a given tournament. This table should display team id, total score, team position (i.e. bucket for same-score teams), order and latest (!) win-lose strike.

I.e. for the following sequence (most recent first) WWLLLWLD I should get 2W

W = win, L = lose, D = draft

Schema

create table matches (
  id integer primary key,
  stage_id integer not null,
  scheduled_at timestamp not null,
  winner_id integer null,
  status text not null default 'finished' -- just to give extra context
);

create table teams (
  id integer primary key
);

create table match_teams (
  match_id integer,
  team_id integer,
  constraint fk_mt_m foreign key (match_id) references matches(id),
  constraint fk_mt_t foreign key (team_id) references teams(id)
);

insert into teams(id) values(1),(2);

insert into matches(id, stage_id, scheduled_at, winner_id) values
    (1, 1, now() - interval '1 day', 1),
    (2, 1, now() - interval '2 days', 1),
    (3, 1, now() - interval '3 days', 2),
    (4, 1, now() - interval '4 days', 1),
    (5, 1, now() - interval '5 days', null);
    
insert into match_teams(match_id, team_id) values
    (1, 1),
    (1, 2),
    (2, 1),
    (2, 2),
    (3, 1), 
    (3, 2),
    (4, 1),
    (4, 2),
    (5, 1),
    (5, 2);

Query itself:

with v_mto as (
  SELECT
    m.id,
    m."stage_id",
    mt."team_id",
    m."scheduled_at",
    (
      case
        when m."winner_id" IS NULL then 0
        when (m."winner_id" = mt."team_id") then 1
        else -1
      end
    ) win
  FROM matches m
  INNER JOIN match_teams mt ON m.id = mt."match_id"
  WHERE m.status = 'finished'
  ORDER BY "stage_id", "team_id", "scheduled_at" desc
),
v_lag as (
  select
    "stage_id",
    "team_id",
    win,
    lag(win, 1, win) over (partition by "stage_id", "team_id" order by "scheduled_at" desc ) lag_win,
    first_value(win) over (partition by "stage_id", "team_id" order by "scheduled_at" desc ) first_win
  from v_mto
)
select
  "stage_id",
  "team_id",
  v_lag.win,
  count(1)
from v_lag
where v_lag.win = v_lag.lag_win and v_lag.win = v_lag.first_win
group by 1, 2, 3


-- This is the query for the final table (on a screenshot)
-- with team_scores as (
--   select
--     m."tournamentStageId",
--     "teamId",
--     sum(
--       -- each win gives 3 score, each draft gives 1 score
--       coalesce((m."winner_id" = mt."team_id")::integer, 0) * 3
--       +
--       (m."winner_id" IS NULL)::int
--     ) as score
--   from matches m
--   inner join match_teams mt on m.id = mt."matchId"
--   where m.status = 1
--   group by m."tournamentStageId", "teamId")
-- select
--   "tournamentStageId",
--   "teamId",
--   t.name,
--   score,
--   dense_rank() over (partition by "tournamentStageId" order by score desc) rank,
--   row_number() over (partition by "tournamentStageId" order by t.name) position
-- -- total number of wins/losses/drafts to be added (the "score" column from the screenshot)
-- from team_scores ts
-- inner join teams t on t.id = ts."teamId"
-- order by "tournamentStageId", rank, position


I've created a sandbox for those who is brave enough to get a deep dive into the task: https://www.db-fiddle.com/f/6jsFFnxQMKwNQWznR3VXHC/2

Also, I've already crafted the part that creates a list of teams together with scores and points, so the attached query will be used as a joined one or sub-select.

Query plan on the real database and query (some indexes, probably, are missing, but that's ok for this moment):

GroupAggregate  (cost=24862.28..29423.68 rows=3 width=24)
"  Group Key: v_lag.""computerGameId"", v_lag.""tournamentStageId"", v_lag.""teamId"", v_lag.win"
  ->  Incremental Sort  (cost=24862.28..29423.61 rows=3 width=16)
"        Sort Key: v_lag.""computerGameId"", v_lag.""tournamentStageId"", v_lag.""teamId"", v_lag.win"
"        Presorted Key: v_lag.""computerGameId"", v_lag.""tournamentStageId"", v_lag.""teamId"""
        ->  Subquery Scan on v_lag  (cost=22581.67..29423.47 rows=3 width=16)
              Filter: ((v_lag.win = v_lag.lag_win) AND (v_lag.lag_win = v_lag.first_win))
              ->  WindowAgg  (cost=22581.67..27468.67 rows=130320 width=32)
                    ->  Subquery Scan on v_mto  (cost=22581.67..24210.67 rows=130320 width=24)
                          ->  Sort  (cost=22581.67..22907.47 rows=130320 width=28)
"                                Sort Key: m.""computerGameId"", m.""tournamentStageId"", mt.""teamId"", m.""scheduledAt"" DESC"
                                ->  Hash Join  (cost=3863.39..8391.38 rows=130320 width=28)
"                                      Hash Cond: (mt.""matchId"" = m.id)"
                                      ->  Seq Scan on match_teams mt  (cost=0.00..2382.81 rows=137281 width=8)
                                      ->  Hash  (cost=2658.10..2658.10 rows=65623 width=24)
                                            ->  Seq Scan on matches m  (cost=0.00..2658.10 rows=65623 width=24)
                                                  Filter: (status = 1)

Thanks everyone for help and suggestions!

The final result: The final result

P.S. it is possible to convert the first query (v_mto) as materialised view or de-normalise win into the match_teams table, as this piece will be used in different queries to build match/game stats.


Solution

  • So, the original query is wrong - gives incorrect result for the standings results.

    I've moved to row_number math to solve this task.

    The final query (with scores) looks like this:

    create materialized view vm_tournament_stage_standings as
    with v_mto as (SELECT m.id,
                          m."computerGameId",
                          m."tournamentStageId",
                          mt."teamId",
                          m."scheduledAt",
                          (
                              case
                                  when m."winnerId" IS NULL then 'D'
                                  when m."winnerId" = mt."teamId" then 'W'
                                  else 'L'
                                  end
                              ) win
                   FROM matches m
                            INNER JOIN match_teams mt ON
                       m.id = mt."matchId"
                   WHERE m.status = 1),
         v_streaks as (select "computerGameId",
                              "tournamentStageId",
                              "teamId",
                              row_number()
                              over grp_ord_matches
                                  - row_number()
                                    over (partition by "computerGameId", "tournamentStageId", "teamId", win order by "scheduledAt" desc ) streak_index,
                              win
                       from v_mto
                           window grp_ord_matches as (partition by "computerGameId", "tournamentStageId", "teamId" order by "scheduledAt" desc)),
         v_streak as (select "computerGameId",
                             "tournamentStageId",
                             "teamId",
                             count(1) || win as streak
                      from v_streaks
                      where streak_index = 0
                      group by "computerGameId", "tournamentStageId", "teamId", "win"),
         team_scores as (select m."tournamentStageId",
                                "teamId",
                                sum((m."winnerId" = mt."teamId")::int) as wins,
                                sum((m."winnerId" is null)::int)          draws,
                                sum((m."winnerId" <> mt."teamId")::int)   loses,
                                sum(
                                                coalesce((m."winnerId" = mt."teamId")::integer, 0) * 3
                                            +
                                                (m."winnerId" IS NULL)::int
                                    )                                  as score
                         from matches m
                                  inner join match_teams mt on m.id = mt."matchId"
                         where m.status = 1
                         group by m."tournamentStageId", "teamId")
    select ts."tournamentStageId",
           ts."teamId",
           score,
           wins,
           draws,
           loses,
           vs.streak as                                                                streak,
           dense_rank() over (partition by ts."tournamentStageId" order by score desc) rank,
           row_number() over (partition by ts."tournamentStageId" order by t.name)     position
    from team_scores ts
             inner join teams t on t.id = ts."teamId"
             inner join v_streak vs on vs."teamId" = t.id and vs."tournamentStageId" = ts."tournamentStageId"
    order by "tournamentStageId", rank, position