Search code examples
pythonpandasdataframedummy-variable

Creating Dummy Variables based on string keyword using Python


I have a dataframe that looks like this:

 PROMOTED_PRODUCT__CREATIVE              ROAS
0   Simple Green 1 Gal. Concentrated    0.027573
1   Simple Green 1 Gal. Concentrated    0.082969
2   Simple Green 1 Gal. Concentrated    0.056278
3   Simple Green 32 oz  Concentrated    0.037286
4   Simple Green 32 oz  Concentrated    0.355841
5   Simple Green 32 oz  Concentrated    0.355853
6   Simple Green 16 oz  Concentrated    0.355923
7   Simple Green 16 oz  Concentrated    0.355749
8   Simple Green 16 oz  Concentrated    0.355810

I am trying to create dummy variables based off an attribute found in a string in column "PROMOTED_PRODUCT__CREATIVE ", like the following:

     1_gal   32_oz   16_oz
   0  1       0       0
   1  1       0       0
   2  1       0       0
   3  0       1       0
   4  0       1       0
   5  0       1       0
    ...

Is there a quick way to use pd.get_dummies() in a manner that will yield the following results based off of key word? ('1 gal', '32 oz', '16 oz', etc.)

Any help would be greatly appreciated. Thanks so much in advance!


Solution

  • I found a way to do this using a for loop.

    Code:

    keyword_list = ['1 gal','16 oz','32 oz',' 5 gal','128 oz','2.5 gal']
    
        for keyword in keyword_list:
            df_pi.loc[df_pi['PROMOTED_PRODUCT__CREATIVE'].str.contains(keyword, flags=re.IGNORECASE), keyword] = 1
        # Reformat columns again
        df_pi.columns = df_pi.columns.str.replace(' ','_')
        
        # Replace NaN with 0
        df_pi.update(df_pi[['1_gal','16_oz','32_oz','_5_gal','128_oz','2.5_gal']].fillna(0))
        df_pi = df_pi.astype({'1_gal': 'int64', '16_oz': 'int64','32_oz': 'int64','_5_gal': 'int64','128_oz': 'int64',
                             '2.5_gal': 'int64'})
    

    Output:

    1_gal  16_oz  32_oz  5_gal...
    1   1     0    0     0
    2   1     0    0     0 
    3   1     0    0     0
    4...
    

    I'll incorporate the bottom half in the loop a bit later/ clean it up. Thanks to anyone who was looking into this. Hope this helps anyone down the line.