Search code examples
nlpstanford-nlp

Stanford Relation Extractor custom model selects only one token of relation entities


I've successfully trained a Relation Extractor model and created a .ser file.

However, I'm running into an issue where the model successfully finds a relation but if one of its entities consists of multiple tokens, only one token is selected. For example, for a relation called Friend_of, and a sentence like:

Sam Tarly's best friend is Jon Snow.

The model will find a relation of type Friend_of between the following entities:

  • Tarly
  • Jon

This causes my tests to mark this as a false positive and the model as a whole to get a bad score.

I've tried training a custom NER model using the same training data, and then using this custom NER model to train the RelationExtractor model with the following properties in my props file:

trainUsePipelineNER=true
ner.model=path/to/custom-ner-model.ser.gz

But that didn't solve the problem.

Is this just a problem of not enough training data or is there something I'm missing here?

Here is the Java code I use to get the relations:

Properties props = new Properties();
props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse, depparse, relation");
props.put("sup.relation.model", "lib/custom-relation-model-pipeline.ser");
props.put("pos.ptb3Escaping", "false");

StanfordCoreNLP pipeline = new StanfordCoreNLP(props);

List<Relation> foundRelations = new ArrayList<>();

for (String doc : documents) {
    Annotation document = new Annotation(doc);
    pipeline.annotate(document);
    List<CoreMap> sentences = document.get(CoreAnnotations.SentencesAnnotation.class);

    for (CoreMap sentence : sentences) {

        List<RelationMention> relationMentions = sentence.get(MachineReadingAnnotations.RelationMentionsAnnotation.class);

        for (RelationMention relation : relationMentions) {
            foundRelations.add(new Relation(relation.getArg(0).getValue(), relation.getType(), relation.getArg(1).getValue()));
        }

    }
}

Thank you!

Simon.


Solution

  • So I looked into the MachineReading relation extraction some more.

    I think you want to replace getValue() with getExtentString() and see if that helps.

    I ran on a sample sentence with our default model:

    Joe Smith works at Google.

    And it worked properly.