I am working on a small program as practice in python and need some help trying to compare two columns in a data frame and replace the contents of one with another. I can see my results as I have converted the data frame to excel.
Anyway, I want to compare two columns Location
and Abbrev
based on the first three letters of each and then replace with the abbreviation. So the LON
in Abbrev
column should be compared with first three letters in London. The Abbrev
column only gives one abbreviation instance.
So I start with:
Team Location GameDay Abbrev
Arsenal London 2/14 LON
Liverpool Liverpool 2/14 LIV
Manchester Manchester 2/16 MAN
Arsenal London 2/23 NEW
Newcastle Manchester 2/16 LEE
and I want to get this output:
Team Location GameDay Abbrev
Arsenal LON 2/14 LON
Liverpool LIV 2/14 LIV
Manchester MAN 2/16 MAN
Arsenal LON 2/23 NEW
Newcastle MAN 2/16 LEE
However, right now I am only getting a full replacement of the columns regardless of match.
Team Location GameDay Abbrev
Arsenal LON 2/14 LON
Liverpool LIV 2/14 LIV
Manchester MAN 2/16 MAN
Arsenal NEW 2/23 NEW
Newcastle LEE 2/16 LEE
This is the piece of code I have.
df['Location'] = df.apply(lambda row: row['Abbrev']
if row['Location'][:3].upper() != row['Abbrev']
else row['Abbrev'],axis=1)
I would like help figuring out whats wrong with my code.
You need to check if row['Location'][:3]
is in abbreviations:
import pandas as pd
data = [['Arsenal', 'London', '2/14', 'LON'],
['Liverpool', 'Liverpool', '2/14', 'LIV'],
['Manchester', 'Manchester', '2/16', 'MAN'],
['Arsenal', 'London', '2/23', 'NEW'],
['Newcastle', 'Manchester', '2/16', 'LEE']]
df = pd.DataFrame(data=data, columns=['Team', 'Location', 'GameDay', 'Abbrev'])
abbreviations = set(df.Abbrev.values)
df['Location'] = df.apply(lambda row: row['Location'][:3].upper() if row['Location'][:3].upper() in abbreviations else row['Abbrev'], axis=1)
print(df)
Output
Team Location GameDay Abbrev
0 Arsenal LON 2/14 LON
1 Liverpool LIV 2/14 LIV
2 Manchester MAN 2/16 MAN
3 Arsenal LON 2/23 NEW
4 Newcastle MAN 2/16 LEE
UPDATE
If you prefer a one-liner:
df['Location'] = df.apply(lambda row: row['Location'][:3].upper() if row['Location'][:3].upper() in df.Abbrev.values else row['Abbrev'], axis=1)