I have a problem with a project of mine. I'm trying to make a clear overview of transfers in Football and I currently have this table:
ClubID | PlayerID | FromDate | ToDate | TeamName | c_Person |
---|---|---|---|---|---|
1 | 1 | 2010-01-01 | 2012-01-01 | Club A | Player 1 |
2 | 1 | 2012-02-01 | 2015-02-01 | Club B | Player 1 |
3 | 1 | 2015-05-01 | 2018-02-01 | Club C | Player 1 |
1 | 2 | 2010-01-01 | 2018-02-02 | Club A | Player 2 |
1 | 2 | 2018-03-02 | 2020-02-01 | Club A | Player 2 |
However, I want to add columns FromClub & ToClub. If Player 1 first plays for Club A from 2010-01-01 to 2012-01-01 and transfers and plays for Club B from 2012-02-01 to 2015-02-01, I want the 'FromClub' and 'ToClub' illustrate the transfer.
I would like the table to look like this:
ClubID | PlayerID | FromDate | ToDate | TeamName | c_Person | FromClub | ToClub |
---|---|---|---|---|---|---|---|
1 | 1 | 2010-01-01 | 2012-01-01 | Club A | Player1 | Nan | Nan |
2 | 1 | 2012-02-01 | 2015-02-01 | Club B | Player 1 | Club A | Club B |
3 | 1 | 2015-05-01 | 2018-02-01 | Club C | Player 1 | Club B | Club C |
1 | 2 | 2010-01-01 | 2018-02-02 | Club A | Player 2 | Nan | Nan |
1 | 2 | 2018-03-02 | 2020-02-01 | Club A | Player 2 | Nan | Nan |
I've been trying to write a function but am not able to solve it. Hopefully someone else can help me with this problem.
Here is the code to create the first Table:
import pandas as pd
from datetime import datetime
df = pd.DataFrame({'ClubID':[1, 2, 3, 1, 1],
'PlayerID':[1, 1, 1, 2, 2],
'FromDate':["2010-01-01", "2012-02-01", "2015-05-01", "2010-01-01", "2018-03-02"],
'ToDate':["2012-01-01", "2015-02-01", "2018-02-01", "2018-02-02", "2020-02-01"],
'TeamName':["Club A", "Club B", "Club C", "Club A", "Club A"],
'c_Person':["Player 1", "Player 1", "Player 1", "Player 2", "Player 2"]})
# convert the 'Date' columns to datetime format
df['FromDate']= pd.to_datetime(df['FromDate'])
df['ToDate']= pd.to_datetime(df['ToDate'])
Thanks in advance!
First for each row in your dataframe, include the information of previous team each Player had been before transfer:
df['PreviousTeam'] = df.groupby('PlayerID')['TeamName'].shift()
>>> df
ClubID FromDate PlayerID TeamName ToDate c_Person PreviousTeam
0 1 2010-01-01 1 Club A 2012-01-01 Player 1 NaN
1 2 2012-02-01 1 Club B 2015-02-01 Player 1 Club A
2 3 2015-05-01 1 Club C 2018-02-01 Player 1 Club B
3 1 2010-01-01 2 Club A 2018-02-02 Player 2 NaN
4 1 2018-03-02 2 Club A 2020-02-01 Player 2 Club A
However if the Player gets transfered to same team, the previous team is same as current team (row #4). So apply below operation to fix that:
df['FromClub'] = df[df['PreviousTeam'] != df['TeamName']]['PreviousTeam']
Finally ToClub
column can be obtained from FromClub
by observing when the player was transferred:
df['ToClub'] = df[~df['FromClub'].isna()]['TeamName']
>>> df.drop('PreviousTeam', axis=1)
ClubID FromDate PlayerID TeamName ToDate c_Person FromClub ToClub
0 1 2010-01-01 1 Club A 2012-01-01 Player 1 NaN NaN
1 2 2012-02-01 1 Club B 2015-02-01 Player 1 Club A Club B
2 3 2015-05-01 1 Club C 2018-02-01 Player 1 Club B Club C
3 1 2010-01-01 2 Club A 2018-02-02 Player 2 NaN NaN
4 1 2018-03-02 2 Club A 2020-02-01 Player 2 NaN NaN
So putting all inside a function, you can just call below with your dataframe and get the desired output:
def fill_club_details(df):
df['PreviousTeam'] = df.groupby('PlayerID')['TeamName'].shift()
df['FromClub'] = df[df['PreviousTeam'] != df['TeamName']]['PreviousTeam']
df['ToClub'] = df[~df['FromClub'].isna()]['TeamName']
return df.drop('PreviousTeam', axis=1)