I have a pandas DataFrame with a text field called "term", that has string values ' 36 months' and ' 60 months' (both with leading spaces). I'm looking to replace the strings with just the numeric piece, i.e. 36 or 60.
I'm trying to follow this post -
Updating a pandas column with a dictionary lookup
Here's my code -
import pandas as pd
import re
loan_data = pd.read_csv('loan_data_2007_2014.csv')
# create a dictionary with the from (key) and to (value) pairs for the lookup replacement
term_set = set(loan_data['term'])
term_dict = {x: int(re.sub('[^0-9]', '', x)) for x in term_set}
# term_dict => {' 36 months': 36, ' 60 months': 60} (both keys have a leading space)
loan_data['term_int'] = loan_data['term']
loan_data['term_int'].update(pd.Series(term_dict))
print("unique term_int = ", loan_data['term_int'].unique())
I expected the term_int field in the DataFrame to contain values 36 or 60, but the lookup didn't work.
What did I do wrong?
Use map
instead of update
.
import pandas as pd
import re
term_set = set(loan_data['term'])
term_dict = {x: int(re.sub('[^0-9]', '', x)) for x in term_set}
loan_data['term_int'] = loan_data['term'].map(term_dict)