Search code examples
pythonpandasdataframeloopsapply

How to use a for loop to create new columns in a Pandas dataframe


Say I have this dataframe

import pandas as pd
df = pd.DataFrame({'Verbatim': ['Pants', 'Shirts', 'Shirts', 'Pants', 'Shoes', 'Shoes', 'Shoes', 'Shoes', 'Dresses, Shoes, Dresses', 'Pants', 'Pants', 'Shirts', 'Dresses Pants Shoes', 'Shoes Pants', 'Pants', 'Pants', 'Dresses', 'Pants', 'Pants', 'Dresses']})

Through various steps, I determine that all of my unique words in the above are

unique_words = ('Pants', 'Shirts', 'Shoes', 'Dresses')

I now want to add columns to my Data Frame that denote the presence of each Unique word in the "verbatim" column. I am creating dummies from verbatim text. So, if a respondent noted "Dresses" in their response, they would get a 1 in the Dresses column.

How do I use a loop/apply statement to automate this? I would like to do something like this

for word in unique_words:
    df['word'] = 0
    df.loc[df['Verbatim'].str.contains("word"), 'word'] = 1

Essentially, I want to know how to use the iterator ('word') to create a column in a dataframe named after the same as that iterator. How do I reference the iterator in the loop? This code works manually, but I can't figure out how to loop it.

Thanks!


Solution

  • You can use the apply function:

    for word in unique_words:
        df[word] = df.apply(lambda x: word in x["Verbatim"], axis=1)
    
    print(df.head()
    

    The output is:

      Verbatim  Pants  Shirts  Shoes  Dresses
    0    Pants   True   False  False    False
    1   Shirts  False    True  False    False
    2   Shirts  False    True  False    False
    3    Pants   True   False  False    False
    4    Shoes  False   False   True    False
    

    If you don't like the type you can put 0s and 1s like this:

    for word in unique_words:
        df[word] = df.apply(lambda x: 1 if word in x["Verbatim"] else 0, axis=1)
    
    print(df.head())
    

    Resulting in:

      Verbatim  Pants  Shirts  Shoes  Dresses
    0    Pants      1       0      0        0
    1   Shirts      0       1      0        0
    2   Shirts      0       1      0        0
    3    Pants      1       0      0        0
    4    Shoes      0       0      1        0