I have a pandas database of some sports data. The columns are name, age, birth city, birth country, rookie, weight, and problem. The original data had birthcity as "City,State" for American players, so when I used a comma delimiter the result was two variables. So now all the American players are shifted over, and I needed to make a "Problem" variable to account for the excess.
How can I shift just the Americans over to the left across thousands of observations? Thanks!
What I have (please excuse the table formatting):
Name Age BirthCity BirthCountry Rookie Weight Problem
Frank 32 Seattle WA USA N 200
Jake 24 Geneva Switzerland Y 210
Desired:
Name Age BirthCity BirthCountry Rookie Weight
Frank 32 Seattle USA N 200
Jake 24 Geneva Switzerland Y 210
One way is to first delete the 3rd (remember Python counts 0 first) column selectively, simultaneously adding an extra column NaN
. Then delete the final Problem
series.
# df, start with this dataframe
#
# Name Age BirthCity BirthCountry Rookie Weight Problem
# 0 Frank 32 Seattle WA USA N 200.0
# 1 Jake 24 Geneva Switzerland Y 210 NaN
def shifter(row):
return np.hstack((np.delete(np.array(row), [3]), [np.nan]))
mask = df['Rookie'] == 'USA'
df.loc[mask, :] = df.loc[mask, :].apply(shifter, axis=1)
df = df.drop(['Problem'], axis=1)
# Name Age BirthCity BirthCountry Rookie Weight
# 0 Frank 32 Seattle USA N 200
# 1 Jake 24 Geneva Switzerland Y 210