regex python-3.x nlp pattern-matching corpus

Create synonyms and use regular expressions to find keyword

Background:

I want to use regular expressions to search for a keyword. However, my keyword has multiple synonyms. For example, the keyword positive can have the following words that I consider as equal to positive: "+", "pos", "POS", "Positive", "POSITIVE"

I've tried looking Create a dataframe with NLTK synonyms and http://www.nltk.org/howto/wordnet.html but I don't think it is quite what I am looking for

Goals:

1) create synonyms for a given keyword (e.g. positive)

2) search for a keyword (e.g. positive) in a corpus using regular expressions

Example:

toy_corpus = 'patient is POS which makes them ideal to treatment '

I think the steps to getting this would look something like this:

1) define synonyms for the positive e.g. positive = ["pos", "POS", "Positive", "POSITIVE", "+"]

2) use regular expression to find the keyword POS

Question

How do I go about achieving this?

Solution

Try it:

import re
question = "patient is POS which makes them ideal to treatment. And the the positive"
find=["pos","POS","positive"]

words=re.findall("\n+",question)
result = [words   for words in find if words in question.split()]
print(result)
['POS', 'positive']

Where \n is a word boundary. Wiki: word boundary More examples: stackoverflow.com Best Regards!