Search code examples
nlpgeography

Identifying geographical locations in text


What kind of work has been done to determine whether a specific string pertains to a geographical location? For example:

'troy, ny'
'austin, texas'
'hotels in las vegas, nv'

I guess what I'm sort of expecting is a statistical approach that gives a degree of confidence that the first two are locations. The last one would probably require a heuristic which grabs "%s, %s" and then uses the same technique. I'm specifically looking for approaches that don't rely too heavily on the proposition 'in', seeing as it's not an entirely unambiguous or consistently available indicator of location.

Can anyone point me to approaches, papers, or existing utilities? Thanks!


Solution

  • The problem you describe is often called geographic query parsing or more generally geographic information retrieval.

    There was a recent task on doing this at CLEF 2007 (http://www.uni-hildesheim.de/geoclef/2007/Query-Parsing.htm). The winning team used a rule based grammar, which is similar to what you probably don't want. Another paper at www2009 talks about GeoParser: http://www2009.eprints.org/239/.

    There are also some papers on Geographic Information Retrieval at CIKM 2007: http://www.geo.unizh.ch/~rsp/gir07/accepted.html

    I don't know of any open source software that does this, but it may be bundled into a search engine like Lemur.