I'm doing dependency parsing with the Stanford library in Java. Is there any way to get back the indices within my original string of a dependency? I have tried to call the getSpans() method, but it returns null for every token:
LexicalizedParser lp = LexicalizedParser.loadModel(
"-maxLength", "80", "-retainTmpSubcategories");
TreebankLanguagePack tlp = new PennTreebankLanguagePack();
GrammaticalStructureFactory gsf = tlp.grammaticalStructureFactory();
Tree parse = lp.apply(text);
GrammaticalStructure gs = gsf.newGrammaticalStructure(parse);
Collection<TypedDependency> tdl = gs.typedDependenciesCollapsedTree();
for(TypedDependency td:tdl)
td.gov().getSpan() // it's null!
td.dep().getSpan() // it's null!
Any idea?
I've finally ended up writing my own helper function to get the spans out my original string:
public HashMap<Integer, TokenSpan> getTokenSpans(String text, Tree parse)
List<String> tokens = new ArrayList<String>();
traverse(tokens, parse, parse.getChildrenAsList());
return extractTokenSpans(text, tokens);
private void traverse(List<String> tokens, Tree parse, List<Tree> children)
if(children == null)
for(Tree child:children)
traverse(tokens, parse, child.getChildrenAsList());
private HashMap<Integer, TokenSpan> extractTokenSpans(String text, List<String> tokens)
HashMap<Integer, TokenSpan> result = new HashMap<Integer, TokenSpan>();
int spanStart, spanEnd;
int actCharIndex = 0;
int actTokenIndex = 0;
char actChar;
while(actCharIndex < text.length())
actChar = text.charAt(actCharIndex);
if(actChar == ' ')
spanStart = actCharIndex;
String actToken = tokens.get(actTokenIndex);
int tokenCharIndex = 0;
while(tokenCharIndex < actToken.length() && text.charAt(actCharIndex) == actToken.charAt(tokenCharIndex))
if(tokenCharIndex != actToken.length())
//TODO: throw exception
spanEnd = actCharIndex;
result.put(actTokenIndex, new TokenSpan(spanStart, spanEnd));
return result;
Then I will call
getTokenSpans(originalString, parse)
So I get a map, which can map every token to its corresponding token span. It's not an elegant solution, but at least it works.