Say I have this dataframe
import pandas as pd
df = pd.DataFrame({'Verbatim': ['Pants', 'Shirts', 'Shirts', 'Pants', 'Shoes', 'Shoes', 'Shoes', 'Shoes', 'Dresses, Shoes, Dresses', 'Pants', 'Pants', 'Shirts', 'Dresses Pants Shoes', 'Shoes Pants', 'Pants', 'Pants', 'Dresses', 'Pants', 'Pants', 'Dresses']})
Through various steps, I determine that all of my unique words in the above are
unique_words = ('Pants', 'Shirts', 'Shoes', 'Dresses')
I now want to add columns to my Data Frame that denote the presence of each Unique word in the "verbatim" column. I am creating dummies from verbatim text. So, if a respondent noted "Dresses" in their response, they would get a 1 in the Dresses column.
How do I use a loop/apply statement to automate this? I would like to do something like this
for word in unique_words:
df['word'] = 0
df.loc[df['Verbatim'].str.contains("word"), 'word'] = 1
Essentially, I want to know how to use the iterator ('word') to create a column in a dataframe named after the same as that iterator. How do I reference the iterator in the loop? This code works manually, but I can't figure out how to loop it.
Thanks!
You can use the apply function:
for word in unique_words:
df[word] = df.apply(lambda x: word in x["Verbatim"], axis=1)
print(df.head()
The output is:
Verbatim Pants Shirts Shoes Dresses
0 Pants True False False False
1 Shirts False True False False
2 Shirts False True False False
3 Pants True False False False
4 Shoes False False True False
If you don't like the type you can put 0s and 1s like this:
for word in unique_words:
df[word] = df.apply(lambda x: 1 if word in x["Verbatim"] else 0, axis=1)
print(df.head())
Resulting in:
Verbatim Pants Shirts Shoes Dresses
0 Pants 1 0 0 0
1 Shirts 0 1 0 0
2 Shirts 0 1 0 0
3 Pants 1 0 0 0
4 Shoes 0 0 1 0