Search code examples
pythondataframecomparisonequals

Python Compare dataframe contents and replace if match?


I am working on a small program as practice in python and need some help trying to compare two columns in a data frame and replace the contents of one with another. I can see my results as I have converted the data frame to excel.

Anyway, I want to compare two columns Location and Abbrev based on the first three letters of each and then replace with the abbreviation. So the LON in Abbrev column should be compared with first three letters in London. The Abbrev column only gives one abbreviation instance.

So I start with:

Team         Location    GameDay   Abbrev
Arsenal      London      2/14      LON
Liverpool    Liverpool   2/14      LIV
Manchester   Manchester  2/16      MAN
Arsenal      London      2/23      NEW
Newcastle    Manchester  2/16      LEE

and I want to get this output:

Team        Location  GameDay   Abbrev
Arsenal     LON       2/14      LON
Liverpool   LIV       2/14      LIV
Manchester  MAN       2/16      MAN
Arsenal     LON       2/23      NEW
Newcastle   MAN       2/16      LEE

However, right now I am only getting a full replacement of the columns regardless of match.

Team        Location    GameDay   Abbrev
Arsenal     LON         2/14      LON
Liverpool   LIV         2/14      LIV
Manchester  MAN         2/16      MAN
Arsenal     NEW         2/23      NEW
Newcastle   LEE         2/16      LEE

This is the piece of code I have.

df['Location'] = df.apply(lambda row: row['Abbrev'] 
                          if row['Location'][:3].upper() != row['Abbrev'] 
                          else row['Abbrev'],axis=1)

I would like help figuring out whats wrong with my code.


Solution

  • You need to check if row['Location'][:3] is in abbreviations:

    import pandas as pd
    
    data = [['Arsenal', 'London', '2/14', 'LON'],
            ['Liverpool', 'Liverpool', '2/14', 'LIV'],
            ['Manchester', 'Manchester', '2/16', 'MAN'],
            ['Arsenal', 'London', '2/23', 'NEW'],
            ['Newcastle', 'Manchester', '2/16', 'LEE']]
    
    df = pd.DataFrame(data=data, columns=['Team', 'Location', 'GameDay', 'Abbrev'])
    
    abbreviations = set(df.Abbrev.values)
    df['Location'] = df.apply(lambda row: row['Location'][:3].upper() if row['Location'][:3].upper() in abbreviations else row['Abbrev'], axis=1)
    
    print(df)
    

    Output

             Team Location GameDay Abbrev
    0     Arsenal      LON    2/14    LON
    1   Liverpool      LIV    2/14    LIV
    2  Manchester      MAN    2/16    MAN
    3     Arsenal      LON    2/23    NEW
    4   Newcastle      MAN    2/16    LEE
    

    UPDATE

    If you prefer a one-liner:

    df['Location'] = df.apply(lambda row: row['Location'][:3].upper() if row['Location'][:3].upper() in df.Abbrev.values else row['Abbrev'], axis=1)