Search code examples
stanford-nlp

Previously tagged NER corpuses for training NER classifier


I am working with Stanford NER models to recognise organisation names from unstructured text. I understand that the training data on which stanford ner classifiers were built is not publicly available. I need previously tagged NER corpuses which have organisation names tagged so that I could retrain a stanford ner model.

One source I am aware of : Getting access to reuters corpus and combining it with the annotations from CoNll2003 shared task data.

Could I get suggestions/pointers on more sources to get previously tagged NER corpuses? (I need to request these datasets through my school )


Solution

  • Do you mean that you wish to retrain with similar NER data as the original classifier uses, or avoid the default corpora altogether?

    I'll assume the first. The corpora we use to train the Stanford English NER classifier are:

    In any case, there is a nice longer list of NER datasets available here.