I am in the process of cleaning many variables(columns in a data frame) to perform text analysis on said variables.
I have a data frame called econ_data.
Here I create a 'list' of all the variables that need to be transformed, for example transforming all text to lower case and removing stop words.
open_responses = ['choice_open_1_f', 'choice_open_1_m', 'choice_open_2_f ', 'choice_open_2_m']
Then I want to create a for loop that cleans up these variables so that I can perform text analysis.
for z in open_responses:
econ_data[z] = econ_data[z].astype(str).str.replace('/',' ')
econ_data[z] = econ_data[z].apply(lambda x: " ".join(x.lower() for x in x.split()))
locals()[econ_data[f"{z}_stop"]] = econ_data[f"{z}"].apply(lambda x: " ".join(x for x in x.split() if x not in stop_words))
The first 2 lines in the for loop work, however, when I try to add a new variable to the data frame when stop words have been removed from the entry, I receive a Key Error message ("KeyError: 'choice_open_1_f_stop'").
Please can someone explain how I can solve this issue?
Many thanks!
You get an error because you are trying to get value of locals()[econ_data[f"{z}_stop"]]
which is not defined.
You should do a simple assignment econ_data[f"{z}_stop"] =
which dataframe handles and creates a key that does not exist if try to assign to it.