Search code examples
pythongeolocation

Removing the zip code from a python list (to obtain the state name from MapQuest output)


This should be simple, but could not get it to work.

I have some strings returned to me by the geolocation MapQuest API. I want to isolate the state name from strings like these, which is kind of hard. Think of 'Pennsylvania Avenue' (which is in D.C.), then there is 'Washington', which can be a state, as well as a street name, and a city.

s = "Goldman Sachs Tower, 200, West Street, Battery Park City, Manhattan Community Board 1, New York County, NYC, New York, 10282, United States of America"
s = "9th St NW, Logan Circle/Shaw, Washington, District of Columbia, 20001, United States of America"
s = "Casper, Natrona County, Wyoming, United States of America"

But I noticed that MapQuest writes the state name just before the zip code, near the end of the string.

To obtain the state name, this works, that is, if there is a zip code:

s = s.split(",")
s = [x.strip() for x in s]
state = s[-3]

However, when there is no zip code, as in the third string, then I get the county (Natrona County).

I tried to eliminate the zip code by:

s = s.split(",")
s = [x.strip() for x in s if '\d{5}' not in x ]

But the regex '\d{5}' does not work - I want Wyoming, not Natrona County.


Solution

  • Use re:

    import re
    
    s = "9th St NW, Logan Circle/Shaw, Washington, District of Columbia, 20001, United States of America"
    
    s = s.split(",")
    number = re.compile(r"\d{5}")
    s = [x.strip() for x in s if not number.search(x)]
    print s
    print s[-2]
    

    output:

    ['9th St NW', 'Logan Circle/Shaw', 'Washington', 'District of Columbia', 'United States of America']
    District of Columbia
    

    Here is some small easy tutorial on it: regex tutorial