Search code examples
nlppattern-recognitionnamed-entity-recognition

US state resolution from unstructured text


I have a database with a "location" field that contains unconstrained user input in the form of a string. I would like to map each entry to either a US state or NULL.

For example:

'Southeastern Massachusetts' -> MA
'Brookhaven, NY' -> NY
'Manitowoc' -> WI
'Blue Springs, MO' -> MO
'A Damp & Cold Corner Of The World.' -> NULL
'Baltimore, Maryland' -> MD
'Indiana' -> IN

I can tolerate some errors but fewer would obviously be better. What's is the best way to go about this?


Solution

  • For posterity: I just threw a bunch of regexps at it, which worked 'pretty alright'.