I am new to stanford.I tried to split two or more independent sentences which are connected with conjunctions like (and,or..etc) into separate single sentences using stanford tree parser.
Example sentence - Lion and tiger are chasing a deer and fox is chasing a rabbit. I want split this for two independent sentences shown below.
1) Lion and tiger are chasing a deer.
2) fox is chasing a rabbit.
sentences should only have to break with conjunctions which connected independent sentences. Not for two subjects(Lion and tiger) or two objects connected with "and". If anyone knows please help me.
Parser tree structure
(ROOT
(S
(NP (NNP Lion)
(CC and)
(NNP tiger))
(VP (VBP are)
(VP (VBG chasing)
(SBAR
(S
(NP (DT a) (NNS deer)
(CC and)
(NN fox))
(VP (VBZ is)
(VP (VBG chasing)
(NP (DT a) (NN rabbit))))))))))
Thanks.
This parse looks incorrect — did this come from the Stanford Parser? When I input the same sentence on the parser demo page, I get the following:
(ROOT
(S
(S
(NP (NNP Lion)
(CC and)
(NNP tiger))
(VP (VBP are)
(VP (VBG chasing)
(NP (DT a) (NNS deer)))))
(CC and)
(S
(NP (NN fox))
(VP (VBZ is)
(VP (VBG chasing)
(NP (DT a) (NN rabbit)))))
(. .)))
With this parse, extracting the two independent clauses would be fairly easy. You can use Tregex (also part of CoreNLP) to search for sibling clauses (S
constituents) with intervening conjunctions (CC
nodes).