Search code examples
pythonpandaspython-re

Dictionary update using .update() and pd.Series doesn't seem to work


I have a pandas DataFrame with a text field called "term", that has string values ' 36 months' and ' 60 months' (both with leading spaces). I'm looking to replace the strings with just the numeric piece, i.e. 36 or 60.

I'm trying to follow this post -

Updating a pandas column with a dictionary lookup

Here's my code -

import pandas as pd
import re

loan_data = pd.read_csv('loan_data_2007_2014.csv')

# create a dictionary with the from (key) and to (value) pairs for the lookup replacement
term_set = set(loan_data['term'])
term_dict = {x: int(re.sub('[^0-9]', '', x)) for x in term_set}

# term_dict => {' 36 months': 36, ' 60 months': 60} (both keys have a leading space)

loan_data['term_int'] = loan_data['term']

loan_data['term_int'].update(pd.Series(term_dict))

print("unique term_int = ", loan_data['term_int'].unique())

I expected the term_int field in the DataFrame to contain values 36 or 60, but the lookup didn't work.

What did I do wrong?


Solution

  • Use map instead of update.

    import pandas as pd
    import re
    
    term_set = set(loan_data['term'])
    term_dict = {x: int(re.sub('[^0-9]', '', x)) for x in term_set}
    
    loan_data['term_int'] = loan_data['term'].map(term_dict)