I'm solo learning football data science, and want to analyse some things in the turkish league. Now I'm trying to make some analyses from the past 5 games of both teams from the upcoming games. First I've already made a dataframe with the games of the season. Now I want to make a indicator of points of goals from each team, but since of the start from the season. The ideia is: I find how many goals the home team scored in the past 5 matches, goals scored at home count as 1 points and goals scored away count as 2 points. I make the same for the away team and for the conceded goals. At the moment, when I filter the last 5 home team games, I only get his last 5 at home, and not the last five overall.
This is my code so far
import pandas as pd
webs = pd.read_csv('https://www.football-data.co.uk/mmz4281/2223/T1.csv')
tk = webs[['Date','HomeTeam','AwayTeam','FTHG','FTAG','FTR']]
tk.columns = ['Data','Home','Away','FT_Gols_H','FT_Gols_A','FT_Result']
tk['Goal_Points'] = tk.groupby('Home')['FT_Gols_H'].rolling(window=5, min_periods=1).sum().reset_index(0,drop=True).shift(1)
I'd suggest reformatting your data a little. Make the dataframe into one with 3 columns: the date, the team, and the goals they got on that date, no matter whether they were at home or away.
Example:
home_team_games = tk[['Data', 'Home', 'FT_Gols_H']].rename(columns={
'Home': 'Team',
'FT_Gols_H': 'Goals',
})
away_team_games = tk[['Data', 'Away', 'FT_Gols_A']].rename(columns={
'Away': 'Team',
'FT_Gols_A': 'Goals',
})
tk = pd.concat([home_team_games, away_team_games])
tk = tk.sort_values('Data')
Then you have a dataframe like this:
Data Team Goals
193 01/02/2023 Antalyaspor 0
192 01/02/2023 Umraniyespor 2
191 01/02/2023 Giresunspor 1
190 01/02/2023 Hatayspor 1
193 01/02/2023 Trabzonspor 2
.. ... ... ...
189 31/01/2023 Karagumruk 1
107 31/10/2022 Antalyaspor 2
107 31/10/2022 Sivasspor 0
106 31/10/2022 Giresunspor 1
106 31/10/2022 Buyuksehyr 3
I had to modify your code a little bit to get it to work on this data frame. I changed the name of the columns, and I also had to drop both levels of the index and not just the first level.
tk['Goals_Prev_5_Games'] = tk.groupby('Team')['Goals'].rolling(window=5, min_periods=1).sum().reset_index(drop=True).shift(1)