Search code examples
searchannotationsnlpchunking

Identify names in a string


I would like to find a good way of identifying names of people, places, etc. within users search queries on my site. For example, if a user asks "how old is George Washington", I need to be able to know from a predefined list that George Washington is a person.

Some of the lists will be global, and some will be user specific. For example, if they asked "how old is John Smith" I may only want to identify the particular John Smith that is my associate--and I wouldn't want to identify him as a person if he's not my associate.

Is there any NLP library or crawling of these lists I could do to leverage Soundx, mature NLP, misspell, etc. functionality? I can write it by hand, but I would rather leverage something mature. Thanks.


Solution

  • What you need is called Named Entity Recognition

    One of the best available software to do it comes with Stanford NLP: http://nlp.stanford.edu/software/CRF-NER.shtml (written in Java)

    If you are on another platform, there are good open source projects in Ruby and Python. Search for "Named Entity Recognition".