I have the following data definition about a football game:
Game = namedtuple('Game', ['Date', 'Home', 'Away', 'HomeShots', 'AwayShots',
'HomeBT', 'AwayBT', 'HomeCrosses', 'AwayCrosses',
'HomeCorners', 'AwayCorners', 'HomeGoals',
'AwayGoals', 'HomeXG', 'AwayXG'])
Here are some exmaples:
[Game(Date=datetime.date(2018, 10, 21), Home='Everton', Away='Crystal Palace', HomeShots='21', AwayShots='6', HomeBT='22', AwayBT='13', HomeCrosses='21', AwayCrosses='14', HomeCorners='10', AwayCorners='5', HomeGoals='2', AwayGoals='0', HomeXG='1.93', AwayXG='1.5'),
Game(Date=datetime.date(2019, 2, 27), Home='Man City', Away='West Ham', HomeShots='20', AwayShots='2', HomeBT='51', AwayBT='6', HomeCrosses='34', AwayCrosses='5', HomeCorners='12', AwayCorners='2', HomeGoals='1', AwayGoals='0', HomeXG='3.68', AwayXG='0.4'),
Game(Date=datetime.date(2019, 2, 9), Home='Fulham', Away='Man Utd', HomeShots='12', AwayShots='15', HomeBT='19', AwayBT='38', HomeCrosses='20', AwayCrosses='12', HomeCorners='5', AwayCorners='4', HomeGoals='0', AwayGoals='3', HomeXG='2.19', AwayXG='2.13'),
Game(Date=datetime.date(2019, 3, 9), Home='Southampton', Away='Tottenham', HomeShots='12', AwayShots='15', HomeBT='13', AwayBT='17', HomeCrosses='15', AwayCrosses='15', HomeCorners='1', AwayCorners='10', HomeGoals='2', AwayGoals='1', HomeXG='2.08', AwayXG='1.27'),
Game(Date=datetime.date(2018, 9, 22), Home='Man Utd', Away='Wolverhampton', HomeShots='16', AwayShots='11', HomeBT='17', AwayBT='17', HomeCrosses='26', AwayCrosses='13', HomeCorners='5', AwayCorners='4', HomeGoals='1', AwayGoals='1', HomeXG='0.62', AwayXG='1.12')]
And two almost identical functions calculating home and away statistics for a given team.
def calculate_home_stats(team, games):
"""
Calculates home stats for the given team.
"""
home_stats = defaultdict(float)
home_stats['HomeShotsFor'] = sum(int(game.HomeShots) for game in games if game.Home == team)
home_stats['HomeShotsAgainst'] = sum(int(game.AwayShots) for game in games if game.Home == team)
home_stats['HomeBoxTouchesFor'] = sum(int(game.HomeBT) for game in games if game.Home == team)
home_stats['HomeBoxTouchesAgainst'] = sum(int(game.AwayBT) for game in games if game.Home == team)
home_stats['HomeCrossesFor'] = sum(int(game.HomeCrosses) for game in games if game.Home == team)
home_stats['HomeCrossesAgainst'] = sum(int(game.AwayCrosses) for game in games if game.Home == team)
home_stats['HomeCornersFor'] = sum(int(game.HomeCorners) for game in games if game.Home == team)
home_stats['HomeCornersAgainst'] = sum(int(game.AwayCorners) for game in games if game.Home == team)
home_stats['HomeGoalsFor'] = sum(int(game.HomeGoals) for game in games if game.Home == team)
home_stats['HomeGoalsAgainst'] = sum(int(game.AwayGoals) for game in games if game.Home == team)
home_stats['HomeXGoalsFor'] = sum(float(game.HomeXG) for game in games if game.Home == team)
home_stats['HomeXGoalsAgainst'] = sum(float(game.AwayXG) for game in games if game.Home == team)
home_stats['HomeGames'] = sum(1 for game in games if game.Home == team)
return home_stats
def calculate_away_stats(team, games):
"""
Calculates away stats for the given team.
"""
away_stats = defaultdict(float)
away_stats['AwayShotsFor'] = sum(int(game.AwayShots) for game in games if game.Away == team)
away_stats['AwayShotsAgainst'] = sum(int(game.HomeShots) for game in games if game.Away == team)
away_stats['AwayBoxTouchesFor'] = sum(int(game.AwayBT) for game in games if game.Away == team)
away_stats['AwayBoxTouchesAgainst'] = sum(int(game.HomeBT) for game in games if game.Away == team)
away_stats['AwayCrossesFor'] = sum(int(game.AwayCrosses) for game in games if game.Away == team)
away_stats['AwayCrossesAgainst'] = sum(int(game.HomeCrosses) for game in games if game.Away == team)
away_stats['AwayCornersFor'] = sum(int(game.AwayCorners) for game in games if game.Away == team)
away_stats['AwayCornersAgainst'] = sum(int(game.HomeCorners) for game in games if game.Away == team)
away_stats['AwayGoalsFor'] = sum(int(game.AwayGoals) for game in games if game.Away == team)
away_stats['AwayGoalsAgainst'] = sum(int(game.HomeGoals) for game in games if game.Away == team)
away_stats['AwayXGoalsFor'] = sum(float(game.AwayXG) for game in games if game.Away == team)
away_stats['AwayXGoalsAgainst'] = sum(float(game.HomeXG) for game in games if game.Away == team)
away_stats['AwayGames'] = sum(1 for game in games if game.Away == team)
return away_stats
I'm wondering if there is a way to abstract over these two functions and merge them into one without creating a wall of if/else statements to determine whether the team plays at home or away from home and which fields should be counted.
Having cleaner data structure allow for writing simpler code.
In that case, your data already contains duplication
(eg, you have both HomeShots
and AwayShots
).
There are many possible answers to how you could structure data here. I'll just go over a solution that doesn't change too much from your original structure.
Statistics = namedtuple('Statistics', ['shots', 'BT', 'crosses', 'corners', 'goals', 'XG'])
Game = namedtuple('Game', ['home', 'away', 'date', 'home_stats', 'away_stats'])
You could use this like this (I haven't included all stats here, just a few to give an example):
def calculate_stats(games, team_name, home_stats_only=False, away_stats_only=False):
home_stats = [g.home_stats._asdict() for g in games if g.home == team_name]
away_stats = [g.away_stats._asdict() for g in games if g.away == team_name]
if away_stats_only:
input_stats = away_stats
elif home_stats_only:
input_stats = home_stats
else:
input_stats = home_stats + away_stats
def sum_on_field(field_name):
return sum(stats[field_name] for stats in input_stats)
return {f:sum_on_field(f) for f in Statistics._fields}
Which can then be used to get both away/home stats:
example_game_1 = Game(
home='Burnley',
away='Arsenal',
date=datetime.now(),
home_stats=Statistics(shots=12, BT=26, crosses=21, corners=4, goals=1, XG=1.73),
away_stats=Statistics(shots=17, BT=26, crosses=22, corners=5, goals=3, XG=2.87),
)
example_game_2 = Game(
home='Arsenal',
away='Pessac',
date=datetime.now(),
home_stats=Statistics(shots=1, BT=1, crosses=1, corners=1, goals=1, XG=1),
away_stats=Statistics(shots=2, BT=2, crosses=2, corners=2, goals=2, XG=2),
)
print(calculate_stats([example_game_1, example_game_2], 'Arsenal'))
print(calculate_stats([example_game_1, example_game_2], 'Arsenal', home_stats_only=True))
print(calculate_stats([example_game_1, example_game_2], 'Arsenal', away_stats_only=True))
Which prints:
{'shots': 18, 'BT': 27, 'crosses': 23, 'corners': 6, 'goals': 4, 'XG': 3.87}
{'shots': 1, 'BT': 1, 'crosses': 1, 'corners': 1, 'goals': 1, 'XG': 1}
{'shots': 17, 'BT': 26, 'crosses': 22, 'corners': 5, 'goals': 3, 'XG': 2.87}
When dealing with this kind of data, it's usually a good idea to use specialised tools like, for example, pandas. It could also be very convenient to use interactive tools, like JupyterLab.