Search code examples
machine-learningnlpclassificationstanford-nlp

stanford NER classification with additional classes


Current stanford NER gives mainly 6 classes LOCATION, TIME, PERSON' ORGANIZATION' MONEY' PERCENT' DATE Additionally it has been trained with English data so could not classify Indian entities.

Is it possible to train the classifier with additional classes so that it can also identify NE as product, month, disease, device etc.

Also it does not classify Indian entities, so support for such non-english classes too can also be added if this is possible.

Is it possible to retrain classifier, tagger for this additional support?


Solution

  • The major hassle for training the model over other classes is the training data.
    Models require highly accurate training data like I brought a <START:product> Mac Book Pro <END> in September and synced it with my <START:device> IPhone <END>. Observe that Iphone could be annotated with either device or product.
    If you can generate or annotate at least 15,000 sentences annotated with classes you wish to recognise [which is not easy]; you are good to go.
    Stanford NER models or OpenNLP NER models don't recognise Indian names because the models are trained on Wall Street journal articles and they are not representative of many names.