Search code examples
python-3.xstanford-nlpopennlpnamed-entity-recognition

Stanford OpenNLP extract only those names that are mentioned in relation to (identified) organisation


With the Stanford NER tagger, I am able to extract all PERSONs and ORGANISATIONS as expected. Here is a short snippet:

    ss=tagger.get_entities(text)
    xorg=unique_list(ss.get('ORGANIZATION'))
    xper=unique_list(ss.get('PERSON'))
    out= (xorg,xperson)
    #out is written to database

My question is how do I extract only those PERSON names which have a relation to named ORGANISATION? Specifically, I want the output as a triplet: PERSON, RELATION, ORGANISATION.

For either "Enron Chairman Kenneth Lay" OR "Kenneth Lay, Chairman, Enron" I expect the output to read as (Kenneth Lay) (Chairman) (Enron).

Any help will be useful.


Solution

  • Plain NER is just about finding (named) entities and label them correctly. Your task is called relation extraction. You should look at following links:

    Stanford Relation Extractor extracts relations between entities: Live_In, Located_In, OrgBased_In, Work_For, and None.

    Stanford OpenIE is able to extract arbitrary binary relations from text. Thus, doing NER isn't necessary beforehand.

    Maybe one of these tools helps you with your task.