Search code examples
pythonpandasdataframedata-cleaning

Python function not accepting the second argument


Sample Data frame -

df = pd.DataFrame({'City':['York New', 'Parague', 'New Delhi', 'Venice', 'new Orleans'],
                    'Event':['Music', 'Poetry', 'Theatre', 'Comedy', 'Tech_Summit'],
                    'Cost':[10000, 5000, 15000, 2000, 12000]})
  
index_ = [pd.Period('02-2018'), pd.Period('04-2018'),
          pd.Period('06-2018'), pd.Period('10-2018'), pd.Period('12-2018')]
  
df.index = index_
print(df)

Problem Statement - For those cities which starts with the keyword ‘New’ or ‘new’, change it to ‘New_’

First I have created a new column in the dataframe to find if the City has "new" in the name and if yes then at what position

df["pos"] = df["City"].apply(lambda x: x.lower().find("new"))

Then I have created a function to replace "New" or "new" by "New_" if they are present in the starting of the city name -

def replace_new(city,pos):
    if pos==0:
        return city.replace("[Nn]ew", "New_", regex = True)
    else:
        return city

df = df[["City","pos"]].apply(replace_new, axis = 1)

When I execute the above code line I am getting this error - "("replace_new() missing 1 required positional argument: 'pos'", 'occurred at index 2018-02')"

What am I doing wrong here? Please help


Solution

  • Use str.replace with a regex:

    df['City'] = df['City'].str.replace(r'^new\s*', 'New_', case=False, regex=True)
    

    output:

                    City        Event   Cost
    2018-02     York New        Music  10000
    2018-04      Parague       Poetry   5000
    2018-06    New_Delhi      Theatre  15000
    2018-10       Venice       Comedy   2000
    2018-12  New_Orleans  Tech_Summit  12000
    

    regex:

    ^       # match start of line
    new     # match "new"
    \s*     # match zero or more spaces