Search code examples
nlpstanford-nlpnamed-entity-recognitionpos-tagger

Custom NER and POS tagging


I was checking out Stanford CoreNLP in order to understand NER and POS tagging. But what if I want to create custom tags for entities like<title>Nights</title>, <genre>Jazz</genre>, <year>1992</year> How can I do it? is CoreNLP useful in this case?


Solution

  • CoreNLP out-of-the-box will be restricted to types they mention : PERSON, LOCATION, ORGANIZATION, MISC, DATE, TIME, MONEY, NUMBER. No, you won't be able to recognize other entities just by assuming it could "intuitively" do it :)

    In practice, you'll have to choose, either:

    1. Find another NER systems that tags those types
    2. Address this tagging task using knowledge-based / unsupervised approaches.
    3. Search for extra resources (corpora) that contain types you want recognize, and re-train a supervised NER system (CoreNLP or other)
    4. Build (and possibly annotate) your own resources - then you'll have to define an annotation scheme, rules, etc. - quite an interesting part of the work!

    Indeed, unless you find an existing system that fulfills your needs, some effort will be required! Unsupervised approaches may help you bootstrapping a system, so as to see if you need to find / annotate a dedicated corpus. In the latter case, it would be better to separate data as train/dev/test parts, so as to be able to assess how much the resulting system performs on unseen data.