Search code examples
pythonnlpspacy

How can I identify the perpetrator and victim in a sentence using NLP?


I am very new to NLP and am looking for topics to explore that may be able to help me in identifying subjects. Specifically, victim and attacker in the following context:

The UK was attacked by China over several weeks

Over several weeks, China attacked the UK.

Using SpaCy, I have identified the subjects, but they change depending on their position:

nlp = spacy.load("en_core_web_sm")
doc1 = nlp("China attacked the UK over several weeks")
doc2 = nlp("The UK was attacked by China over several weeks")
docs = [doc1, doc2]
for doc in docs:
  print("============")
  for chunk in doc.noun_chunks:
    print(chunk.text, chunk.root.text, chunk.root.dep_,
            chunk.root.head.text)

Output:

============
China China nsubj attacked
the UK UK dobj attacked
several weeks weeks pobj over
============
The UK UK nsubjpass attacked
China China pobj by
several weeks weeks pobj over

Any help and direction would be greatly appreciated.


Solution

  • This is called Semantic Role Labelling and it is hard. In spaCy our general recommendation is that you don't model it as NER, but instead use generic NER labels like PERSON (or GPE here) and the dependency parse to see how far you can get before considering other approaches.

    See section 10 in chapter 4 of the spaCy course for a very specific overview of this problem.

    For an overview of the research on the topic I recommend Jurafsky & Martin's book.