I want to calculate how many times a list of words appears in a column. Here is my dataframe:
original people result
John is a good friend John, Mary 1
Mary and Peter are going to marry Peter, Mary 2
Bond just met the Bond girl Bond 2
Chris is having dinner NaN 0
All Marys are here Mary 0
I tried to use the code suggested here Check if a column contains words from another column in pandas dataframe:
import pandas as pd
import re
df['result'] = [', '.join([p for p in po
if re.search(f'\\b{p}\\b', o)) ]
for o, po in zip(df.original, df.people.str.split(',\o*'))
]
# And after I would try to calculate the number of words in column 'result'
but then I receive the following message:
error: bad escape \o at position 1
Could anyone make a suggestion?
In [39]: df = pd.DataFrame({'original':["John is a good friend", "Mary and Peter are going to marry", "Bond just met the Bond girl", "Chris is having dinner", "All Marys are here"], "people": ["John, Mary", "Peter, Mary", "Bond", '', "Mary"]})
In [40]: df
Out[40]:
original people
0 John is a good friend John, Mary
1 Mary and Peter are going to marry Peter, Mary
2 Bond just met the Bond girl Bond
3 Chris is having dinner
4 All Marys are here Mary
In [41]: df['result'] = df.apply(lambda row: sum((row['original'].count(p.strip()) for p in row['people'].split(',') if p), start=0), axis=1)
In [42]: df
Out[42]:
original people result
0 John is a good friend John, Mary 1
1 Mary and Peter are going to marry Peter, Mary 2
2 Bond just met the Bond girl Bond 2
3 Chris is having dinner 0
4 All Marys are here Mary 1