Dependency parsing using ClearNLP creates a DEPTree
object. I have parsed a large corpus and serialized all the data in CoNLL format (e.g., this ClearNLP page on Google code).
But I can't figure out how to deserialize them. ClearNLP provides a DEPTree#toStringCoNLL()
method (scroll down this page to see it). I am looking for something to read a CoNLL format parse tree and create a DEPTree
object. I tried to reverse-engineer it, but didn't really understand the inner workings of the code.
I have, instead, created my own dependency tree class to handle the basic functionalities I need, but I would really like to know how to get a DEPTree
object instead. So far, I haven't found any method in their API which does this.
Found the answer, so sharing the wisdom on SO :-) ...
The deserialization can be done using the TSVReader
in the edu.emory.clir.clearnlp.reader
package.
public void readCoNLL(String inputFile) throws Exception {
TSVReader reader = new TSVReader(0, 1, 2, 4, 5, 6, 7);
reader.open(new FileInputStream(inputFile));
DEPTree tree;
while ((tree = reader.next()) != null)
System.out.println(tree.toString(DEPNode::toStringDEP));
}
This is provided here by the author of ClearNLP, Jinho Choi.
In older versions (< 3.x) you will need to use the com.clearnlp.reader.DEPReader
class instead of TSVReader
.