Search code examples
pythonpandasdata-science

Is there a Python function that allows me to sum the last goals of a team in a dataframe?


I'm solo learning football data science, and want to analyse some things in the turkish league. Now I'm trying to make some analyses from the past 5 games of both teams from the upcoming games. First I've already made a dataframe with the games of the season. Now I want to make a indicator of points of goals from each team, but since of the start from the season. The ideia is: I find how many goals the home team scored in the past 5 matches, goals scored at home count as 1 points and goals scored away count as 2 points. I make the same for the away team and for the conceded goals. At the moment, when I filter the last 5 home team games, I only get his last 5 at home, and not the last five overall.

This is my code so far

import pandas as pd

webs = pd.read_csv('https://www.football-data.co.uk/mmz4281/2223/T1.csv')
tk = webs[['Date','HomeTeam','AwayTeam','FTHG','FTAG','FTR']]
tk.columns = ['Data','Home','Away','FT_Gols_H','FT_Gols_A','FT_Result']

tk['Goal_Points'] = tk.groupby('Home')['FT_Gols_H'].rolling(window=5, min_periods=1).sum().reset_index(0,drop=True).shift(1)

Solution

  • I'd suggest reformatting your data a little. Make the dataframe into one with 3 columns: the date, the team, and the goals they got on that date, no matter whether they were at home or away.

    Example:

    home_team_games = tk[['Data', 'Home', 'FT_Gols_H']].rename(columns={
        'Home': 'Team',
        'FT_Gols_H': 'Goals',
    })
    away_team_games = tk[['Data', 'Away', 'FT_Gols_A']].rename(columns={
        'Away': 'Team',
        'FT_Gols_A': 'Goals',
    })
    tk = pd.concat([home_team_games, away_team_games])
    tk = tk.sort_values('Data')
    

    Then you have a dataframe like this:

               Data          Team  Goals
    193  01/02/2023   Antalyaspor      0
    192  01/02/2023  Umraniyespor      2
    191  01/02/2023   Giresunspor      1
    190  01/02/2023     Hatayspor      1
    193  01/02/2023   Trabzonspor      2
    ..          ...           ...    ...
    189  31/01/2023    Karagumruk      1
    107  31/10/2022   Antalyaspor      2
    107  31/10/2022     Sivasspor      0
    106  31/10/2022   Giresunspor      1
    106  31/10/2022    Buyuksehyr      3
    

    I had to modify your code a little bit to get it to work on this data frame. I changed the name of the columns, and I also had to drop both levels of the index and not just the first level.

    tk['Goals_Prev_5_Games'] = tk.groupby('Team')['Goals'].rolling(window=5, min_periods=1).sum().reset_index(drop=True).shift(1)