Search code examples
pythonnumpyloopsprobabilitymodeling

Simulating a football league in python


I could use some help on this. I want to simulate a football league in python for an arbitrary number of teams and tally the points over a season in a table. The rules are simple:

  1. Every team in the league plays each other twice. so each team plays 2*(Nteams_in_league -1)
  2. Teams have a 50% chance of a winning.
  3. There are only two possible outcomes, win or lose.
  4. A win gets 3 points, and a loss gets a team 0 points.

Here's an example of the output I'm looking for with a league of 8 teams over 11 seasons. It's based off an attempt I made but isn't completely correct because it's not allocating point across the winner and loser correctly.

columns = season, rows = team, observations are the points tally.

1 2 3 4 5 6 7 8 9 10 11
1 57 51 66 54 60 51 57 54 45 72
2 51 51 42 51 66 60 63 60 81 63
3 51 69 51 48 36 48 57 54 48 60
4 54 57 66 54 75 60 60 66 69 42
5 72 57 63 57 60 54 48 66 54 42
6 54 45 54 45 60 57 51 60 66 51
7 51 63 72 63 63 54 60 63 54 66
8 66 57 42 57 51 57 51 75 72 60

Solution

  • Here is one approach. This simulates each season independently. For each season and pair of teams, we simulate two outcomes for two games, assuming each team has a 50% chance at victory.

    import numpy as np
    import pandas as pd
    from itertools import combinations
    
    def simulate_naive(n_teams):
      'Simulate a single season'
      scores = np.zeros(n_teams, dtype=int)
      for i, j in combinations(range(n_teams), 2):
          # each pair of teams play twice, each time with 50/50 chance of 
          # either team winning; the winning team gets three points
          scores[i if np.random.rand() < 0.5 else j] += 3
          scores[i if np.random.rand() < 0.5 else j] += 3
          
      return scores
    
    n_teams = 8
    n_seasons = 10
    df = pd.DataFrame({season: simulate_naive(n_teams) for season in range(n_seasons)})
    print(df)
    #     0   1   2   3   4   5   6   7   8   9
    # 0  15  30  21  12  24  24   9  21  18  33
    # 1  21  18  24  24  15  21  12  30  18  21
    # 2  21  27  21  18  21  27  27  15  12  24
    # 3  27  12   9  36  18  12  30  15  24  21
    # 4  24  24  27  24  18  18  33  18  30  15
    # 5  18  15  21  15  15  27  15  24  24  15
    # 6  18  18  30  21  33  21  24  27  18  21
    # 7  24  24  15  18  24  18  18  18  24  18
    

    I wonder if there is a nicer statistical approach that avoids simulating each game.