Search code examples
pythondatabasepandasnumpyshift

Shifting certain rows to the left in pandas dataframe


I have a pandas database of some sports data. The columns are name, age, birth city, birth country, rookie, weight, and problem. The original data had birthcity as "City,State" for American players, so when I used a comma delimiter the result was two variables. So now all the American players are shifted over, and I needed to make a "Problem" variable to account for the excess.

How can I shift just the Americans over to the left across thousands of observations? Thanks!

What I have (please excuse the table formatting):

Name Age BirthCity BirthCountry Rookie Weight Problem

Frank 32 Seattle   WA           USA    N      200
Jake  24 Geneva    Switzerland   Y     210

Desired:

Name Age BirthCity BirthCountry Rookie Weight

Frank 32 Seattle   USA           N     200
Jake  24 Geneva    Switzerland   Y     210

Solution

  • One way is to first delete the 3rd (remember Python counts 0 first) column selectively, simultaneously adding an extra column NaN. Then delete the final Problem series.

    # df, start with this dataframe
    #
    #     Name  Age BirthCity BirthCountry Rookie Weight  Problem
    # 0  Frank   32   Seattle           WA    USA      N    200.0
    # 1   Jake   24    Geneva  Switzerland      Y    210      NaN
    
    def shifter(row):
        return np.hstack((np.delete(np.array(row), [3]), [np.nan]))
    
    mask = df['Rookie'] == 'USA'
    df.loc[mask, :] = df.loc[mask, :].apply(shifter, axis=1)
    
    df = df.drop(['Problem'], axis=1)
    
    #     Name  Age BirthCity BirthCountry Rookie Weight
    # 0  Frank   32   Seattle          USA      N    200
    # 1   Jake   24    Geneva  Switzerland      Y    210