With the Stanford NER tagger, I am able to extract all PERSONs and ORGANISATIONS as expected. Here is a short snippet:
ss=tagger.get_entities(text)
xorg=unique_list(ss.get('ORGANIZATION'))
xper=unique_list(ss.get('PERSON'))
out= (xorg,xperson)
#out is written to database
My question is how do I extract only those PERSON names which have a relation to named ORGANISATION? Specifically, I want the output as a triplet: PERSON, RELATION, ORGANISATION.
For either "Enron Chairman Kenneth Lay" OR "Kenneth Lay, Chairman, Enron" I expect the output to read as (Kenneth Lay) (Chairman) (Enron).
Any help will be useful.
Plain NER is just about finding (named) entities and label them correctly. Your task is called relation extraction. You should look at following links:
Stanford Relation Extractor extracts relations between entities: Live_In
, Located_In
, OrgBased_In
, Work_For
, and None
.
Stanford OpenIE is able to extract arbitrary binary relations from text. Thus, doing NER isn't necessary beforehand.
Maybe one of these tools helps you with your task.