I have a dataframe that looks like this:
PROMOTED_PRODUCT__CREATIVE ROAS
0 Simple Green 1 Gal. Concentrated 0.027573
1 Simple Green 1 Gal. Concentrated 0.082969
2 Simple Green 1 Gal. Concentrated 0.056278
3 Simple Green 32 oz Concentrated 0.037286
4 Simple Green 32 oz Concentrated 0.355841
5 Simple Green 32 oz Concentrated 0.355853
6 Simple Green 16 oz Concentrated 0.355923
7 Simple Green 16 oz Concentrated 0.355749
8 Simple Green 16 oz Concentrated 0.355810
I am trying to create dummy variables based off an attribute found in a string in column "PROMOTED_PRODUCT__CREATIVE ", like the following:
1_gal 32_oz 16_oz
0 1 0 0
1 1 0 0
2 1 0 0
3 0 1 0
4 0 1 0
5 0 1 0
...
Is there a quick way to use pd.get_dummies() in a manner that will yield the following results based off of key word? ('1 gal', '32 oz', '16 oz', etc.)
Any help would be greatly appreciated. Thanks so much in advance!
I found a way to do this using a for loop.
Code:
keyword_list = ['1 gal','16 oz','32 oz',' 5 gal','128 oz','2.5 gal']
for keyword in keyword_list:
df_pi.loc[df_pi['PROMOTED_PRODUCT__CREATIVE'].str.contains(keyword, flags=re.IGNORECASE), keyword] = 1
# Reformat columns again
df_pi.columns = df_pi.columns.str.replace(' ','_')
# Replace NaN with 0
df_pi.update(df_pi[['1_gal','16_oz','32_oz','_5_gal','128_oz','2.5_gal']].fillna(0))
df_pi = df_pi.astype({'1_gal': 'int64', '16_oz': 'int64','32_oz': 'int64','_5_gal': 'int64','128_oz': 'int64',
'2.5_gal': 'int64'})
Output:
1_gal 16_oz 32_oz 5_gal...
1 1 0 0 0
2 1 0 0 0
3 1 0 0 0
4...
I'll incorporate the bottom half in the loop a bit later/ clean it up. Thanks to anyone who was looking into this. Hope this helps anyone down the line.