I am having problems with the following code:
for i in np.arange(37,finaldf.shape[0]):
# We choose to search by category with a 500m radius. radius = 500 LIMIT = 100 category_id = '4bf58dd8d48988d102951735' #ID for Accessory stores
latitude = finaldf['Latitude'][i] longitude = finaldf['Longitude'][i]
# Define the corresponding URL url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&categoryId={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, category_id, radius, LIMIT)
# Send the GET Request results = requests.get(url).json()
# Get relevant part of JSON and transform it into a pandas dataframe
# assign relevant part of JSON to venues venues = results['response']['venues']
# tranform venues into a dataframe dataframe = json_normalize(venues) dataframe.head()
# keep only columns that include venue name, and anything that is associated with location filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id'] dataframe_filtered = dataframe.loc[:, filtered_columns]
# function that extracts the category of the venue def get_category_type(row):
try:
categories_list = row['categories']
except:
categories_list = row['venue.categories']
if len(categories_list) == 0:
return None
else:
return categories_list[0]['name']
# filter the category for each row dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)
# clean column names by keeping only last term dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]
print(str(i) + ') The number of shops in '
+finaldf['Neighbourhood'][i] + ' is ' +str(dataframe_filtered.shape[0]) + '\n') N_shop.append(dataframe_filtered.shape[0])
This iteration makes me count for each neighborhood the number of stores that corresponds, but when executing it I get the following error:
KeyError Traceback (most recent call last)
<ipython-input-109-94d4817fe1e7> in <module>
6 category_id = '4bf58dd8d48988d102951735' #ID for Accessory stores
7
----> 8 latitude = finaldf['Latitude'][i]
9 longitude = finaldf['Longitude'][i]
10
/opt/conda/envs/Python36/lib/python3.6/site-packages/pandas/core/series.py in __getitem__(self, key)
866 key = com.apply_if_callable(key, self)
867 try:
--> 868 result = self.index.get_value(self, key)
869
870 if not is_scalar(result):
/opt/conda/envs/Python36/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_value(self, series, key)
4372 try:
4373 return self._engine.get_value(s, k,
-> 4374 tz=getattr(series.dtype, 'tz', None))
4375 except KeyError as e1:
4376 if len(self) > 0 and (self.holds_integer() or self.is_boolean()):
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
KeyError: 38
The final df is composed of 5 columns and 39 rows in which the postal code, district, neighborhood, longitude and latitude are located since I will then use that data to locate them on the map. I have searched for null values, or have another type of format but I have not found any. What is wrong? Since from what I understand there is a row (the 38) that is the one that causes the error. Thanks for the help.
This question is very hard to answer without knowing the form of your dataframe, but I would guess your index contains integers, but not the specific value 38
, perhaps as the result of earlier filtering. Pandas is probably interpreting 38
as a potential label, not an integer index.
From the pandas indexing documentation:
.ix
offers a lot of magic on the inference of what the user wants to do. To wit,.ix
can decide to index positionally OR via labels depending on the data type of the index. This has caused quite a bit of user confusion over the years.
Your for-loop suggests you want to iterate over rows, so you could change it to use .iloc
:
for i in np.arange(37, finaldf.shape[0]):
latitude = finaldf['Latitude'].iloc[i] # Use .iloc[i]
longitude = finaldf['Longitude'].iloc[i]
If you wanted to rewrite that in a clever way, you could try:
for lat, long in final_df[['Latitude', 'Longitude']].iloc[37:].iterrows():
# Use lat, long
...
This relies on Python's automatic unpacking to iterate through each row's Series.