So I have a pattern I want to strip from a corpus of words, however there are certain words that match the pattern which I want to keep. I have a list of such words, and can remove all words matching the pattern.
But, how do I keep the words in the list, and remove any others matching the pattern?
Thank you.
You can use set intersection
import re
s = 'Philip Hammond under pressure after claiming that public sector workers are overpaid'
s1 = re.sub("[^\w]", " ", s).split()
Then you go for
d1 = ['Philip', 'Hammond']
print (set(s1).intersection(d1))
Finally
{'Philip', 'Hammond'}