Search code examples
pythonregexwords

How to remove all words matching a pattern, except certain words which I want to preserve?(they match the pattern)


So I have a pattern I want to strip from a corpus of words, however there are certain words that match the pattern which I want to keep. I have a list of such words, and can remove all words matching the pattern.

But, how do I keep the words in the list, and remove any others matching the pattern?

Thank you.


Solution

  • You can use set intersection

    import re
    s = 'Philip Hammond under pressure after claiming that public sector workers are overpaid'
    s1 = re.sub("[^\w]", " ",  s).split()
    

    Then you go for

    d1 = ['Philip', 'Hammond']
    
    print (set(s1).intersection(d1))
    

    Finally

    {'Philip', 'Hammond'}