I have a dataset with venue_id (about 1,500 of them), physical address, latitude, and longitude.
I want to create a column named 'overlap', which counts the number of overlapping venue_ids if any.
So for example, for venue_id == 1, within 2km radius if there are any other venue_ids that overlaps in terms of 2km radius, count it and save it in column 'overlap'. If there are 2 venue_ids that overlaps with venue_id == 1, 'overlap' would equal to 2.
So far, I tried first visualizing it with 'folium'
import pandas as pd
import folium
m = folium.Map(location=[37.553975551114476, 126.97545224493899],
zoom_start=10)
locations = df['lat'], df['lng']
df = df.dropna(how='any')
print(df.isna().sum())
for _, row in df.iterrows():
folium.Circle(location=[row['lat'], row['lng']],
radius=2000).add_to(m)
m.save("index.html")
The problem is that folium's Circle would draw a circle in 'pixel' if I understand correctly, and it is fixed to the base 'zoom-level' I've selected creating the base map.
p.s. There is no need to actually visualize the result as long as that 2km radius measurements are correctly calculated, I've only tried visualizing it through folium to see if I can 'manually' count the overlapping circles...
Thanks in advance.
It sounds like the goal here is just to determine how many points are within 2km of any other point within your dataset. The Haversine distance is the way to go in this case. Since you're only interested in a short distance and you have a relatively small number of points, this answer provides the central function. Then it's just a matter of applying it to your data. Here's one approach to do that:
import pandas as pd
import numpy as np
# function from https://stackoverflow.com/a/29546836/4325492
def haversine_np(lon1, lat1, lon2, lat2):
"""
Calculate the great circle distance between two points
on the earth (specified in decimal degrees)
All args must be of equal length.
"""
lon1, lat1, lon2, lat2 = map(np.radians, [lon1, lat1, lon2, lat2])
dlon = lon2 - lon1
dlat = lat2 - lat1
a = np.sin(dlat/2.0)**2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon/2.0)**2
c = 2 * np.arcsin(np.sqrt(a))
km = 6367 * c
return km
# generate some sample data
lng1, lat1 = np.random.randn(2, 1000)
df = pd.DataFrame(data={'lng':lng1, 'lat':lat1})
# Apply to the data
df['overlap'] = df.apply(lambda x: sum(haversine_np(x[0], x[1], df.lng, df.lat) <= 2) - 1, axis=1)
When applying the function, just count the number of times that another point has a distance <= 2km. We subtract off 1 again since the function is applied to all rows and each point will be 0km from itself.