Search code examples
nlpstanford-nlpnamed-entity-recognition

In Stanford CoreNlp, why are not all proper nouns (NNP) also named entities


I use Stanford CoreNlp for Names Entity Recognition (NER). I've noticed that in some cases that it's not 100% which is fine and not surprising. However, even if a, say, single-word named entity is not recognized (i.e., the label is O), it has the tag NNP (proper noun).

For example, given the example sentence "The RestautantName in New York is the best outlet.", nerTags() yields [O, O, O, LOCATION, LOCATION, O, O, O, O, O] only correctly recognizing "New York". The parse tree for this sentence looks like

(ROOT
  (S
    (NP
      (NP (DT The) (NNP RestautantName))
      (PP (IN in)
        (NP (NNP New) (NNP York))))
    (VP (VBZ is)
      (NP (DT the) (JJS best) (NN outlet)))
    (. .)))

so "RestaurantName" is a proper noun (NNP)

When I look up the definition of a proper noun, it sounds very close to a named entity. What's the difference?


Solution

  • The parser is trained on parse treebank data and the named entity recognizer is trained on separate named entity data for PERSON, LOCATION, ORGANIZATION, MISC.

    I would've thought that RestaurantName might get marked as MISC, but if it's not getting tagged it means that there are not really examples like that in the training data for named entities. The key point here is that the parse decisions and named entity decisions are made completely independently of each other by separate models trained on separate data.