Search code examples
javastanford-nlp

Stanford CoreNLP : Get CharacterOffset Annotation from Parse Tree


Using the parser output from another parser, I have created the parse tree for a sentence. Now, I need to find the character offset for each Noun Phrase that occurs in the parse.

How can I go about that?


Solution

  • Take a subtree that corresponds to a Noun Phrase. Get the leaves of this tree:

    List<Tree> leaves = tree.getLeaves();
    

    Then take the starting point of the first leaf (CharacterOffsetBeginAnnotation value) and the end point of the last leaf (CharacterOffsetEndAnnotation). The resulting interval is the offset of an NP.

    To get the offset value, take the leaf's label and cast it to HasOffset:

    Label label = firstLeaf.label();
    HasOffset ofs = (HasOffset) label;
    int start = ofs.beginPosition();
    

    This works for Stanford CoreNLP 3.2.0.