Search code examples
regexpython-3.xnlppattern-matchingcorpus

Create synonyms and use regular expressions to find keyword


Background:

I want to use regular expressions to search for a keyword. However, my keyword has multiple synonyms. For example, the keyword positive can have the following words that I consider as equal to positive: "+", "pos", "POS", "Positive", "POSITIVE"

I've tried looking Create a dataframe with NLTK synonyms and http://www.nltk.org/howto/wordnet.html but I don't think it is quite what I am looking for

Goals:

1) create synonyms for a given keyword (e.g. positive)

2) search for a keyword (e.g. positive) in a corpus using regular expressions

Example:

toy_corpus = 'patient is POS which makes them ideal to treatment '

I think the steps to getting this would look something like this:

1) define synonyms for the positive e.g. positive = ["pos", "POS", "Positive", "POSITIVE", "+"]

2) use regular expression to find the keyword POS

Question

How do I go about achieving this?


Solution

  • Try it:

    import re
    question = "patient is POS which makes them ideal to treatment. And the the positive"
    find=["pos","POS","positive"]
    
    words=re.findall("\n+",question)
    result = [words   for words in find if words in question.split()]
    print(result)
    ['POS', 'positive']
    

    Where \n is a word boundary. Wiki: word boundary More examples: stackoverflow.com Best Regards!