Background:
I want to use regular expressions to search for a keyword. However, my keyword has multiple synonyms. For example, the keyword positive
can have the following words that I consider as equal to positive
: "+", "pos", "POS", "Positive", "POSITIVE"
I've tried looking Create a dataframe with NLTK synonyms and http://www.nltk.org/howto/wordnet.html but I don't think it is quite what I am looking for
Goals:
1) create synonyms for a given keyword (e.g. positive
)
2) search for a keyword (e.g. positive
) in a corpus using regular expressions
Example:
toy_corpus = 'patient is POS which makes them ideal to treatment '
I think the steps to getting this would look something like this:
1) define synonyms for the positive
e.g. positive
= ["pos", "POS", "Positive", "POSITIVE", "+"]
2) use regular expression to find the keyword POS
Question
How do I go about achieving this?
Try it:
import re
question = "patient is POS which makes them ideal to treatment. And the the positive"
find=["pos","POS","positive"]
words=re.findall("\n+",question)
result = [words for words in find if words in question.split()]
print(result)
['POS', 'positive']
Where \n is a word boundary. Wiki: word boundary More examples: stackoverflow.com Best Regards!