I am looking on standford corenlp using the Named Entity REcognizer.I have different kinds of input text and i need to tag it into my own Entity.So i started training my own model and it doesnt seems to be working.
For eg: my input text string is "Book of 49 Magazine Articles on Toyota Land Cruiser 1956-1987 Gold Portfolio http://t.co/EqxmY1VmLg http://t.co/F0Vefuoj9Q"
I go through the examples to train my own models and and look for only some words that I am interested in.
My jane-austen-emma-ch1.tsv looks like this
Toyota PERS
Land Cruiser PERS
From the above input text i am only interested in those two words. The one is Toyota and the other word is Land Cruiser.
The austin.prop look like this
trainFile = jane-austen-emma-ch1.tsv
serializeTo = ner-model.ser.gz
map = word=0,answer=1
useClassFeature=true
useWord=true
useNGrams=true
noMidNGrams=true
useDisjunctive=true
maxNGramLeng=6
usePrev=true
useNext=true
useSequences=true
usePrevSequences=true
maxLeft=1
useTypeSeqs=true
useTypeSeqs2=true
useTypeySequences=true
wordShape=chris2useLC
Run the following command to generate the ner-model.ser.gz file
java -cp stanford-corenlp-3.4.1.jar edu.stanford.nlp.ie.crf.CRFClassifier -prop austen.prop
public static void main(String[] args) {
String serializedClassifier = "edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz";
String serializedClassifier2 = "C:/standford-ner/ner-model.ser.gz";
try {
NERClassifierCombiner classifier = new NERClassifierCombiner(false, false,
serializedClassifier2,serializedClassifier);
String ss = "Book of 49 Magazine Articles on Toyota Land Cruiser 1956-1987 Gold Portfolio http://t.co/EqxmY1VmLg http://t.co/F0Vefuoj9Q";
System.out.println("---");
List<List<CoreLabel>> out = classifier.classify(ss);
for (List<CoreLabel> sentence : out) {
for (CoreLabel word : sentence) {
System.out.print(word.word() + '/' + word.get(AnswerAnnotation.class) + ' ');
}
System.out.println();
}
} catch (ClassCastException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
Here is the output I am getting
Book/PERS of/PERS 49/O Magazine/PERS Articles/PERS on/O Toyota/PERS Land/PERS Cruiser/PERS 1956-1987/PERS Gold/O Portfolio/PERS http://t.co/EqxmY1VmLg/PERS http://t.co/F0Vefuoj9Q/PERS
which i think its wrong.I am looking for Toyota/PERS and Land Cruiser/PERS(Which is a multi valued fied.
Thanks for the Help.Any help is really appreciated.
The NERClassifier* is word level, that is, it labels words, not phrases. Given that, the classifier seems to be performing fine. If you want, you can hyphenate words that form phrases. So in your labeled examples and in your test examples, you would make "Land Cruiser" to "Land_Cruiser".