I am working on information extraction from medical texts (very new to NLP!). At the moment I am interested to find and extract the medications which are mentioned in a predefined list of drugs. For example, consider the text:
"John was prescribed aspirin due to hight temperature"
Thus, given the list of medications (in Python language):
list_of_meds = ['aspirin', 'ibuprofen', 'paracetamol']
The extracted drug is aspirin
. That's fine.
Now consider another case:
"John was prescribed ibuprofen, because he could not tolerate paracetamol"
Now, if I extract the drugs using the list (for example with regular expression), then the extracted drugs are ibuprofen
and paracetamol
.
QUESTION How to separate actually prescribed and untolerated drugs? Is there a way to label prescribed (used) and other mentioned drugs?
This is a complex problem. To capture the nuances around negation, you need to step into the world of dependency parsing and relationship extraction. Couple of paths you can take to add sophistication to your current approach and the add-on by @Jordan are:
Handling negation in relations is not a solved problem. The state of the art around this is usually associated with sentiment analysis. An introduction on using dependency parsing to identify and handle negation is available at this Stanford NLP Sentiment Analysis using RNN page