Search code examples
pythonpandasdata-munging

Python, Pandas - Issue applying function to a column in a dataframe to replace only certain items


I have a dictionary of abbreviations of some city names that our system (for some reason) applies to data (i.e. 'Kansas City' is abbreviated 'Kansas CY', and Oklahoma City is spelled correctly).

I am having an issue getting my function to apply to the column of the dataframe but it works when I pass in strings of data. Code sample below:

def multiple_replace(text, dict):
  # Create a regular expression  from the dictionary keys
  regex = re.compile("(%s)" % "|".join(map(re.escape, dict.keys())))

  # For each match, look-up corresponding value in dictionary
  return regex.sub(lambda mo: dict[mo.string[mo.start():mo.end()]], text)

testDict = {"Kansas CY": "Kansas City"}

dfData['PREV_CITY'] = dfData['PREV_CITY'].apply(multiple_replace, dict=testDict)

When I add 'axis=1' into that last line it errors out saying I provided too many args. Otherwise, it runs without error it just doesn't make the changes when there is a match to the dictionary.

Thank you in advance! -Reece


Solution

  • You can use map and pass a dict to replace exact matches against the dict keys with the dict values, as you may have case-sensitive matches I'd lower all the strings first prior to the match:

    dfData['PREV_CITY'] = dfData['PREV_CITY'].str.lower().map(testDict, na_action='ignore')
    

    this assumes the keys in your dict are also lower case