Search code examples
pythonpandasdataframefunctioncolumnsorting

Problem writing a function that adds Columns for Transfer from and to club


I have a problem with a project of mine. I'm trying to make a clear overview of transfers in Football and I currently have this table:

ClubID PlayerID FromDate ToDate TeamName c_Person
1 1 2010-01-01 2012-01-01 Club A Player 1
2 1 2012-02-01 2015-02-01 Club B Player 1
3 1 2015-05-01 2018-02-01 Club C Player 1
1 2 2010-01-01 2018-02-02 Club A Player 2
1 2 2018-03-02 2020-02-01 Club A Player 2

However, I want to add columns FromClub & ToClub. If Player 1 first plays for Club A from 2010-01-01 to 2012-01-01 and transfers and plays for Club B from 2012-02-01 to 2015-02-01, I want the 'FromClub' and 'ToClub' illustrate the transfer.

I would like the table to look like this:

ClubID PlayerID FromDate ToDate TeamName c_Person FromClub ToClub
1 1 2010-01-01 2012-01-01 Club A Player1 Nan Nan
2 1 2012-02-01 2015-02-01 Club B Player 1 Club A Club B
3 1 2015-05-01 2018-02-01 Club C Player 1 Club B Club C
1 2 2010-01-01 2018-02-02 Club A Player 2 Nan Nan
1 2 2018-03-02 2020-02-01 Club A Player 2 Nan Nan

I've been trying to write a function but am not able to solve it. Hopefully someone else can help me with this problem.

Here is the code to create the first Table:

import pandas as pd
from datetime import datetime

df = pd.DataFrame({'ClubID':[1, 2, 3, 1, 1],
                  'PlayerID':[1, 1, 1, 2, 2],
                  'FromDate':["2010-01-01", "2012-02-01", "2015-05-01", "2010-01-01", "2018-03-02"],
                  'ToDate':["2012-01-01", "2015-02-01", "2018-02-01", "2018-02-02", "2020-02-01"],
                  'TeamName':["Club A", "Club B", "Club C",  "Club A", "Club A"],
                  'c_Person':["Player 1", "Player 1", "Player 1", "Player 2", "Player 2"]})

# convert the 'Date' columns to datetime format
df['FromDate']= pd.to_datetime(df['FromDate'])
df['ToDate']= pd.to_datetime(df['ToDate'])

Thanks in advance!


Solution

  • First for each row in your dataframe, include the information of previous team each Player had been before transfer:

    df['PreviousTeam'] = df.groupby('PlayerID')['TeamName'].shift()
    
    >>> df
       ClubID    FromDate  PlayerID TeamName      ToDate  c_Person PreviousTeam
    0       1  2010-01-01         1   Club A  2012-01-01  Player 1          NaN
    1       2  2012-02-01         1   Club B  2015-02-01  Player 1       Club A
    2       3  2015-05-01         1   Club C  2018-02-01  Player 1       Club B
    3       1  2010-01-01         2   Club A  2018-02-02  Player 2          NaN
    4       1  2018-03-02         2   Club A  2020-02-01  Player 2       Club A
    

    However if the Player gets transfered to same team, the previous team is same as current team (row #4). So apply below operation to fix that:

    df['FromClub'] = df[df['PreviousTeam'] != df['TeamName']]['PreviousTeam']
    

    Finally ToClub column can be obtained from FromClub by observing when the player was transferred:

    df['ToClub'] = df[~df['FromClub'].isna()]['TeamName']
    
    >>> df.drop('PreviousTeam', axis=1)
          ClubID    FromDate  PlayerID TeamName      ToDate  c_Person FromClub  ToClub
    0       1  2010-01-01         1   Club A  2012-01-01  Player 1      NaN     NaN
    1       2  2012-02-01         1   Club B  2015-02-01  Player 1   Club A  Club B
    2       3  2015-05-01         1   Club C  2018-02-01  Player 1   Club B  Club C
    3       1  2010-01-01         2   Club A  2018-02-02  Player 2      NaN     NaN
    4       1  2018-03-02         2   Club A  2020-02-01  Player 2      NaN     NaN
    

    So putting all inside a function, you can just call below with your dataframe and get the desired output:

    def fill_club_details(df):
        df['PreviousTeam'] = df.groupby('PlayerID')['TeamName'].shift()
        df['FromClub'] = df[df['PreviousTeam'] != df['TeamName']]['PreviousTeam']
        df['ToClub'] = df[~df['FromClub'].isna()]['TeamName']
        return df.drop('PreviousTeam', axis=1)