Search code examples
xmlstanford-nlpnamed-entity-recognitioninformation-extraction

How do I generate an xml output from standfordner classifier?


I have used standfordNER classifier to classify text. Here is the code.

string docText = fileContent;
        string txt = "";
        var classified = Classifier.classifyToCharacterOffsets(docText).toArray();

        for (int i = 0; i < classified.Length; i++)
        {
            Triple triple = (Triple)classified[i];

            int second = Convert.ToInt32(triple.second().ToString());
            int third = Convert.ToInt32(triple.third().ToString());
            txt = txt + ('\t' + triple.first().ToString() + '\t' + docText.Substring(second, third - second));

            string s = Classifier.classifyWithInlineXML(txt);
            string s1 = Classifier.classifyToString(s, "xml", true);
            Panel1.GroupingText = s1;

        }


        Panel1.Visible = true;

and this is the out put:

LOCATION    Lanka LOCATION  colombo ORGANIZATION microsoft

But i need an out put in xml format like this

<LOCATION>  Lanka </LOCATION>   <LOCATION>colombo</LOCATION>    <ORGANIZATION> microsoft</ORGANIZATION> 

In my code i have used ,

 string s = Classifier.classifyWithInlineXML(txt);
            string s1 = Classifier.classifyToString(s, "xml", true);

to get the xml ,but its not working. since i m new to this field please do a help for me to resolve this. Thanks a lot


Solution

  • This sample code should be helpful:

       String content = "...";
       String classifierPath = "edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz";
       AbstractSequenceClassifier<CoreLabel> asc  = CRFClassifier.getClassifierNoExceptions(classifierPath);
       String result = asc.classifyWithInlineXML(content);