How does one get the subject of a sentence (in a general way) using the SemanticGraph component from Stanford CoreNLP?
I've tried the code posted below, but the output indicates subject is null.
String sentence = "Carl has 84 Skittles.";
Annotation doc = InitUtil.initStanford(sentence, "tokenize, ssplit, pos, lemma, ner, parse");
SemanticGraph semGraph = doc.get(SENTENCE).get(0).get(DEPENDENCIES);
IndexedWord verb = semGraph.getFirstRoot();
IndexedWord subject = semGraph.getChildWithReln(verb, GrammaticalRelation.valueOf("nsubj"));
System.out.println(subject);
If I try the same code replacing the second to last line with the 3 lines below, I get the expected output of "Carl". The difference appears to be a private field of GrammaticalRelation
called specific
, but the value of this field appears to be sentence-specific. My question is how to get the subject in a way that can be applied to all or nearly all sentences.
Set<GrammaticalRelation> relations = semGraph.childRelns(verb);
GrammaticalRelation relation = relations.iterator().next();
IndexedWord subject = semGraph.getChildWithReln(verb, relation);
Turns out the problem wasn't with the specific
field.
SemanticGraph.getChildWIthReln
relies on GrammaticalRelation.equals()
, which checks if the languages of the two objects are compatible. GrammaticalRelation.valueOf(String)
returns a GrammaticalRelation
with language as Language.English
, while the Stanford Parser uses Language.UniversalEnglish
. The two languages are incompatible for some reason. Changing the call to GrammaticalRelation.valueOf(String)
to GrammaticalRelation.valueOf(Language, String)
solved the problem.