Search code examples
nlpspacynamed-entity-recognitiondependency-parsing

Is possible to get dependency/pos information for entities in Spacy?


I am working on extracting entities from scientific text (I am using scispacy) and later I will want to extract relations using hand-written rules. I have extracted entities and their character span successfully, and I can also get the pos and dependency tags for tokens and noun chunks. So I am comfortable with the two tasks separately, but I want to bring the two together and I have been stuck for a while.

The idea is that I want to be able to write rules such as: (just an example) if in a sentence/clause there are two entities where the first one is a 'DRUG/CHEMICAL' + is the subject, and the second one is a 'DISEASE' + is an object --> (then) infer 'treatment' relation between the two.

If anyone has any hints on how to approach this task, I would really appreciate it. Thank you!

S.

What I am doing to extract entities:

doc = nlp(text-with-more-than-one-sent)

for ent in doc.ents:

`... (get information about the ent e.g. its character span)`

Getting dependency information (for noun chunks and for tokens):

for chunk in doc.noun_chunks:

print(f"Text: {chunk.text}, Root text: {chunk.root.text}, Root dep: {chunk.root.dep_}, Root head text: {chunk.root.head.text}, POS: {chunk.root.head.pos_}")

_

for token in doc:

print(f"Text: {token.text}, DEP label: {token.dep_}, Head text: {token.head.text}, Head POS: {token.head.pos_}, Children: {[child for child in token.children]}")


Solution

  • You can use the merge_entities mini-component to convert entities to single tokens, which would simplify what you're trying to do. There's also a component to merge noun chunks similarly.