I have a dataframe like this
Name Corpus
0 James Bond Junior Bristleback Agent James Bond went missing
1 Batman Bin Superman Superman saves the day again
2 Thor S/O Odin Loki was last seen in March 2020
I wish to get this output.
Name Corpus Value
0 James Bond Junior Bristleback Agent James Bond went missing True
1 Batman Bin Superman Superman saves the day again True
2 Thor S/O Odin Loki was last seen in March 2020 False
I have previously tried regex but it seems I can't get the desired output. Is there anyway to achieve this with regex or some other libraries/packages?
Not sure if this exactly fits your needs. It essentially converts each sentence into a set of words, and checks if there is any overlap:
df.Name.str.split().apply(set) & df.Corpus.str.split().apply(set)
Output:
0 True
1 True
2 False
dtype: bool