Search code examples
pythonpandasdataframetext

Python calculate number of words that match between two column


I want to calculate how many times a list of words appears in a column. Here is my dataframe:

original                           people       result
John is a good friend              John, Mary   1
Mary and Peter are going to marry  Peter, Mary  2
Bond just met the Bond girl        Bond         2
Chris is having dinner             NaN          0
All Marys are here                 Mary         0

I tried to use the code suggested here Check if a column contains words from another column in pandas dataframe:

import pandas as pd
import re
df['result'] = [', '.join([p for p in po 
                     if re.search(f'\\b{p}\\b', o)) ]
                for o, po in zip(df.original, df.people.str.split(',\o*'))
             ]
# And after I would try to calculate the number of words in column 'result'

but then I receive the following message:

error: bad escape \o at position 1

Could anyone make a suggestion?


Solution

  • In [39]: df = pd.DataFrame({'original':["John is a good friend", "Mary and Peter are going to marry", "Bond just met the Bond girl", "Chris is having dinner", "All Marys are here"], "people": ["John, Mary", "Peter, Mary", "Bond", '', "Mary"]})
    
    In [40]: df
    Out[40]:
                                original       people
    0              John is a good friend   John, Mary
    1  Mary and Peter are going to marry  Peter, Mary
    2        Bond just met the Bond girl         Bond
    3             Chris is having dinner
    4                 All Marys are here         Mary
    
    In [41]: df['result'] = df.apply(lambda row: sum((row['original'].count(p.strip()) for p in row['people'].split(',') if p), start=0), axis=1)
    
    In [42]: df
    Out[42]:
                                original       people  result
    0              John is a good friend   John, Mary       1
    1  Mary and Peter are going to marry  Peter, Mary       2
    2        Bond just met the Bond girl         Bond       2
    3             Chris is having dinner                    0
    4                 All Marys are here         Mary       1