Search code examples
nlpopennlpcorpustraining-datanamed-entity-recognition

Annotated Training data for NER corpus


It is mentioned in the documentation of opennlp that we've to train our model with 15000 line for a good performance. now, I've to extract different entities from the document which means I've to add different tags for many tokens in the training data(15000 lines) which will take a lot of time. Is there any other way to do this? which will reduce the time or any other method which I can proceed.

Thanks.


Solution

  • Here are some tools:

    GATE http://gate.ac.uk/

    GATE Teamware (web-based) http://gate.ac.uk/teamware/

    XConc Suite http://www-tsujii.is.s.u-tokyo.a...

    Sapient (sentence-based) http://www.aber.ac.uk/en/cs/rese...

    Knowtator (Protégé plug-in) http://knowtator.sourceforge.net/

    CorpusTool http://www.wagsoft.com/CorpusToo...

    UIMA CAS Editor http://uima.apache.org/

    Callisto http://callisto.mitre.org/

    Wordfreak http://wordfreak.sourceforge.net/

    MMax2 http://mmax2.sourceforge.net/

    reference: https://www.quora.com/Natural-Language-Processing-What-are-the-best-tools-for-manually-annotating-a-text-corpus-with-entities-and-relationships