Search code examples
pythonpython-3.xpandasdataframenlp

Partial Matching of name in a corpus to names in another column in a Pandas dataframe


I have a dataframe like this

                            Name                            Corpus
0  James Bond Junior Bristleback     Agent James Bond went missing
1            Batman Bin Superman      Superman saves the day again
2                  Thor S/O Odin  Loki was last seen in March 2020

I wish to get this output.

                            Name                            Corpus  Value
0  James Bond Junior Bristleback     Agent James Bond went missing   True
1            Batman Bin Superman      Superman saves the day again   True
2                  Thor S/O Odin  Loki was last seen in March 2020  False

I have previously tried regex but it seems I can't get the desired output. Is there anyway to achieve this with regex or some other libraries/packages?


Solution

  • Not sure if this exactly fits your needs. It essentially converts each sentence into a set of words, and checks if there is any overlap:

    df.Name.str.split().apply(set) & df.Corpus.str.split().apply(set)
    

    Output:

    0     True
    1     True
    2    False
    dtype: bool