How to extract an unlabelled/untyped dependency tree from a TreeAnnotation using Stanford CoreNLP?

The target language is Spanish.

The English pipeline has support for typed dependencies whereas the Spanish pipeline, to my knowledge, does not.

The goal is to produce a dependency tree from a TreeAnnotation where the end result is a list of directed edges. Is this possible with CoreNLP 3.4.1 and using Spanish models, if so: how?

Background

I'm using Stanford CoreNLP 3.4.1 + (3.5.0 Spanish models for POS tagging) (Due to compatibility reasons, Java 8 cannot be used yet) with the following configuration:

Properties props = new Properties();
props.setProperty("annotators", "tokenize, ssplit, pos, ner, parse");
props.setProperty("tokenize.options", "invertible=true,ptb3Escaping=true");
props.setProperty("tokenize.language", "es");

props.setProperty("pos.model", "edu/stanford/nlp/models/pos-tagger/spanish/spanish-distsim.tagger");
props.setProperty("ner.model", "edu/stanford/nlp/models/ner/spanish.ancora.distsim.s512.crf.ser.gz");

props.setProperty("parse.model", "edu/stanford/nlp/models/srparser/spanishSR.ser.gz"); //Stanford Parser 3.4.1 shift-reduce models for Spanish. 

props.setProperty("ner.applyNumericClassifiers", "false");
props.setProperty("ner.useSUTime", "false");

Which is then used to create the pipeline and run annotation of a document.

StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
pipeline.annotate(document);

List<CoreMap> sentences = document.get(CoreAnnotations.SentencesAnnotation.class);

for(CoreMap sentence: sentences) {

    // ... extract start, end position of sentence ...

    for (CoreLabel token: sentence.get(CoreAnnotations.TokensAnnotation.class)) {

        // ... extract POS tags, NER annotations, id ...
    }

    //This works, and I have a tree that is not empty.
    Tree tree = sentence.get(TreeCoreAnnotations.TreeAnnotation.class);
}

By using a debugger I was able to examine both sentences and tokens and conclude that they have the following content:

Sentence (keys)

From edu.stanford.nlp.ling.CoreAnnotations:

TextAnnotation
CharacterOffsetBeginAnnotation
CharacterOffsetEndAnnotation
TokensAnnotation
TokenBeginAnnotation
TokenEndAnnotation
SentenceIndexAnnotation

From edu.stanford.nlp.trees.TreeCoreAnnotations

TreeAnnotation

Tokens (keys)

From edu.stanford.nlp.ling.CoreAnnotations

TextAnnotation
OriginalTextAnnotation
CharacterOffsetBeginAnnotation
CharacterOffsetEndAnnotation
BeforeAnnotation
AfterAnnotation
IndexAnnotation
SentenceIndexAnnotation
PartOfSpeechAnnotation
NamedEntityTagAnnotation

From edu.stanford.nlp.trees.TreeCoreAnnotations

HeadWordAnnotation - In my experiments: this one always points to itself, i.e. the token where the annotation is retrieved from.
HeadTagAnnotation

Thanks in advance!

Solution

There is no support for Spanish dependency parsing in CoreNLP at the moment. This includes typed dependency conversion from constituency parses.

There is a head finder implemented (but not fully tested). You could hack an untyped dependency converter using this head finder, but we have no guarantees that this will yield a sensible parse.