Search code examples

Creating Dummy Variables based on string keyword using Python

I have a dataframe that looks like this:

0   Simple Green 1 Gal. Concentrated    0.027573
1   Simple Green 1 Gal. Concentrated    0.082969
2   Simple Green 1 Gal. Concentrated    0.056278
3   Simple Green 32 oz  Concentrated    0.037286
4   Simple Green 32 oz  Concentrated    0.355841
5   Simple Green 32 oz  Concentrated    0.355853
6   Simple Green 16 oz  Concentrated    0.355923
7   Simple Green 16 oz  Concentrated    0.355749
8   Simple Green 16 oz  Concentrated    0.355810

I am trying to create dummy variables based off an attribute found in a string in column "PROMOTED_PRODUCT__CREATIVE ", like the following:

     1_gal   32_oz   16_oz
   0  1       0       0
   1  1       0       0
   2  1       0       0
   3  0       1       0
   4  0       1       0
   5  0       1       0

Is there a quick way to use pd.get_dummies() in a manner that will yield the following results based off of key word? ('1 gal', '32 oz', '16 oz', etc.)

Any help would be greatly appreciated. Thanks so much in advance!


  • I found a way to do this using a for loop.


    keyword_list = ['1 gal','16 oz','32 oz',' 5 gal','128 oz','2.5 gal']
        for keyword in keyword_list:
            df_pi.loc[df_pi['PROMOTED_PRODUCT__CREATIVE'].str.contains(keyword, flags=re.IGNORECASE), keyword] = 1
        # Reformat columns again
        df_pi.columns = df_pi.columns.str.replace(' ','_')
        # Replace NaN with 0
        df_pi = df_pi.astype({'1_gal': 'int64', '16_oz': 'int64','32_oz': 'int64','_5_gal': 'int64','128_oz': 'int64',
                             '2.5_gal': 'int64'})


    1_gal  16_oz  32_oz  5_gal...
    1   1     0    0     0
    2   1     0    0     0 
    3   1     0    0     0

    I'll incorporate the bottom half in the loop a bit later/ clean it up. Thanks to anyone who was looking into this. Hope this helps anyone down the line.