I have a PostgreSQL table that is mostly a bridge table but it also has some extra stuff.
Essentially it holds the information about players in a game. So we have a unique id for this instance of a player in a game. Then an id that is FK to game table, and an id that is FK to player table. There is also some other irrelevant stuff. Something like this:
Table players_games
| id | 12564
| player_id | 556
| game_id | 156184
What I want to do is find how many occurrences there are of a player playing with another. So, if player1 is in the same game as player2, they have played together once. There are 2+ players in a game.
So what I want to do is populate a new table, that holds three values: player_lo, player_hi, times_played.
And either have one row for each pair and the number of times they played, or if it ends up being more efficient, a row for each iteration and have the value set as 1 so these can be added together later, maybe distributed. So you might see something like:
p1, p2, 1
p1, p2, 1
And these get reduced later to:
p1, p2, 2
So I was wondering if there was some clever way to do this with SQL, or if there's SQL that can reduce my programming effort, before starting to write a slightly complex python script to do it.
To do this, you need to do a self join on the player_games table. The first subquery is for the first player, and the second for the second player. The "first" player is the one with the lower player id.
select pg1.player_id as player1, pg2.player_id as player2, count(*) as num_games
from (select distinct game_id, player_id
from players_games pg
) pg1 join
(select distinct game_id, player_id
from players_games pg
) pg2
on pg1.game_id = pg2.game_id and
pg1.player_id < pg2.player_id
group by pg1.player_id, pg2.player_id
Note that the join condition uses a "<" on the player ids. This is to prevent counting duplicates (so players A,B are not also counted as B,A).
Also, I added a "distinct" in the inner subqueries just in case a single player might appear more than once for a given game. Perhaps this is not necessary. To be sure, you should have a unique index on the composite key game_id, player_id.