I'm using Python3 and have a pandas df that looks like
zip
0 07105
1 00000
2 07030
3 07032
4 07032
I would like to add state and city using the python package uszipcode
import uszipcode
search = SearchEngine(simple_zipcode=False)
def zco(x):
print(search.by_zipcode(x)['City'])
df['City'] = df[['zip']].fillna(0).astype(int).apply(zco)
However, I get the following error
TypeError: 'Zipcode' object is not subscriptable
Can someone help with the error? Thank you in advance.
The call search.by_zipcode(x)
returns a ZipCode()
instance, not a dictionary, so applying ['City']
to that object fails.
Instead, use either the .major_city
attribute of the shorter alias, the .city
attribute; you want to return that value, not print it:
def zco(x):
return search.by_zipcode(x).city
If all you are going to use the uszipcode
project for is mapping zip codes to state and city names, you don’t need to use the full database (a 450MB download). Just stick with the ‘simple’ version, which is only 9MB, by leaving out the simple_zipcode=False
argument to SearchEngine()
.
Next, this is going to be really really slow. .apply()
uses a simple loop under the hood, and for each row the .by_zipcode()
method will query a SQLite database using SQLAlchemy, create a single result object with all the columns from the matching row, then return that object, just so you can get a single attribute from them.
You'd be much better off querying the database directly, with the Pandas SQL methods. The uszipcode
package is still useful here as it handles downloading the database for you and creating a SQLAlchemy session, the SearchEngine.ses
attribute gives you direct access to it, but from there I'd just do:
from uszipcode import SearchEngine, SimpleZipcode
search = SearchEngine()
query = (
search.ses.query(
SimpleZipcode.zipcode.label('zip'),
SimpleZipcode.major_city.label('city'),
SimpleZipcode.state.label('state'),
).filter(
SimpleZipcode.zipcode.in_(df['zip'].dropna().unique())
)
).selectable
zipcode_df = pd.read_sql_query(query, search.ses.connection(), index_col='zip')
to create a Pandas Dataframe with all your unique zipcodes mapped to city and state columns. You can then join your dataframe with the zipcode dataframe:
df = pd.merge(df, zipcode_df, how='left', left_on='zip', right_index=True)
This adds city
and state
columns to your original dataframe. If you need to pull in more columns, add them to the search.ses.query(...)
portion, using .label()
to give them a suitable column name in the output dataframe (without a .label()
, they'll get prefixed with simple_zipcode_
or zipcode_
, depending on the class you are using). Pick from the model attributes documented, but take into account that if you need access to the full Zipcode
model attributes you need to use SearchEngine(simple_zipcode=False)
to ensure you get the full 450MB dataset at your disposal, then use Zipcode.<column>.label(...)
instead of SimpleZipcode.<column>.label(...)
in the query.
With the zipcodes as the index in the zipcode_df
dataframe, that's going to be a lot faster (zippier :-)) than using SQLAlchemy on each row individually.