Search code examples
stanford-nlpopennlppos-taggerpart-of-speechsimplenlg

How to Extract subject Verb Object using NLP Java? for every sentence


I want to find a subject, verb, and object for each sentence and then it will be passed to natural language generation library simpleNLG to form a sentence.

I tried multiple libraries like Cornlp, opennlp, Standford parsers. But I can not find them accurately.

Now in the worst case, I will have to write a long set of if-else to find subject, verb, and object form each sentence which is not always accurate for simpleNLG

like,

  • NN, nsub etc goes to subject, VB, VBZ goes to verb.

I tried lexical parser,

LexicalizedParser lp = **new LexicalizedParser("englishPCFG.ser.gz");**
String[] sent = { "This", "is", "an", "easy", "sentence", "." };
Tree parse = (Tree) lp.apply(Arrays.asList(sent));
parse.pennPrint();
System.out.println();
TreePrint tp = new TreePrint("penn,typedDependenciesCollapsed");
tp.print(parse);

which gives this output,

nsubj(use-2, I-1)
root(ROOT-0, use-2)
det(parser-4, a-3)
dobj(use-2, parser-4)

And I want something like this

subject = I
verb = use
det = a
object = parser

Is there a simpler way to find this in JAVA or should I go with if-else? please help me with it.


Solution

  • You can use the openie annotator to get triples. You can run this at the command line or build a pipeline with these annotators.

    command:

    java -Xmx10g edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,lemma,ner,depparse,natlog,openie -file example.txt
    

    Java:

    Properties props = new Properties();
    props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,depparse,natlog,openie");
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
    Annotation result = pipeline.process("..."); 
    

    input:

    Joe ate some pizza.
    

    output:

    Extracted the following Open IE triples:
    1.0     Joe     ate     pizza
    

    More details here: https://stanfordnlp.github.io/CoreNLP/openie.html