I want to create new columns based on phrase existence
This is my data
No Body
1 Office software is already paid
2 Excel software is not paid yet
3 Power point software is already paid
I want to categorized by existence of some phrase, This is my code,
countries1 = df.body.str.extract('(software|is already paid)', expand = False)
dummies1 = pd.get_dummies(countries1)
df_1 = pd.concat([df,dummies1],axis = 1)
The result is
No Body software is already paid
1 Office software is already paid 0 1
2 Excel software is not paid yet 1 0
3 Power point software is already paid 0 1
What I expected is
No Body software is already paid
1 Office software is already paid 1 1
2 Excel software is not paid yet 1 0
3 Power point software is already paid 1 1
Whats wrong in my code? or maybe I don't use the right function
Let's try using extractall
:
df.assign(**df.Body.str.extractall('(software|is already paid)')[0]
.str.get_dummies().sum(level=0))
Output:
No Body is already paid software
0 1 Office software is already paid 1 1
1 2 Excel software is not paid yet 0 1
2 3 Power point software is already paid 1 1