TypeError: 'Zipcode' object is not subscriptable

I'm using Python3 and have a pandas df that looks like

I would like to add state and city using the python package uszipcode

import uszipcode
search = SearchEngine(simple_zipcode=False)
def zco(x):
    print(search.by_zipcode(x)['City'])

df['City'] = df[['zip']].fillna(0).astype(int).apply(zco)

However, I get the following error

TypeError: 'Zipcode' object is not subscriptable

Can someone help with the error? Thank you in advance.

Solution

The call search.by_zipcode(x) returns a ZipCode() instance, not a dictionary, so applying ['City'] to that object fails.

Instead, use either the .major_city attribute of the shorter alias, the .city attribute; you want to return that value, not print it:

def zco(x):
    return search.by_zipcode(x).city

If all you are going to use the uszipcode project for is mapping zip codes to state and city names, you don’t need to use the full database (a 450MB download). Just stick with the ‘simple’ version, which is only 9MB, by leaving out the simple_zipcode=False argument to SearchEngine().

Next, this is going to be really really slow. .apply() uses a simple loop under the hood, and for each row the .by_zipcode() method will query a SQLite database using SQLAlchemy, create a single result object with all the columns from the matching row, then return that object, just so you can get a single attribute from them.

You'd be much better off querying the database directly, with the Pandas SQL methods. The uszipcode package is still useful here as it handles downloading the database for you and creating a SQLAlchemy session, the SearchEngine.ses attribute gives you direct access to it, but from there I'd just do:

from uszipcode import SearchEngine, SimpleZipcode

search = SearchEngine()
query = (
    search.ses.query(
        SimpleZipcode.zipcode.label('zip'),
        SimpleZipcode.major_city.label('city'),
        SimpleZipcode.state.label('state'),
    ).filter(
        SimpleZipcode.zipcode.in_(df['zip'].dropna().unique())
    )
).selectable
zipcode_df = pd.read_sql_query(query, search.ses.connection(), index_col='zip')

to create a Pandas Dataframe with all your unique zipcodes mapped to city and state columns. You can then join your dataframe with the zipcode dataframe:

df = pd.merge(df, zipcode_df, how='left', left_on='zip', right_index=True)

This adds city and state columns to your original dataframe. If you need to pull in more columns, add them to the search.ses.query(...) portion, using .label() to give them a suitable column name in the output dataframe (without a .label(), they'll get prefixed with simple_zipcode_ or zipcode_, depending on the class you are using). Pick from the model attributes documented, but take into account that if you need access to the full Zipcode model attributes you need to use SearchEngine(simple_zipcode=False) to ensure you get the full 450MB dataset at your disposal, then use Zipcode.<column>.label(...) instead of SimpleZipcode.<column>.label(...) in the query.

With the zipcodes as the index in the zipcode_df dataframe, that's going to be a lot faster (zippier :-)) than using SQLAlchemy on each row individually.