I have a set of names, longditude and latitude cordinates I am trying to run a nearesat neighbour search on.
name lat long
0 Veronica Session 11.463798 14.136215
1 Lynne Donahoo 44.405370 -82.350737
2 Debbie Hanley 14.928905 -91.344523
3 Lisandra Earls 68.951464 -138.976699
4 Sybil Leef -1.678356 33.959323
Currenlty I am am using sklearn.neighbors to run a search on the data but I recive a type error. The data is being stored in a dataframe.
TypeError: NearestNeighbors.__init__() takes 1 positional argument but 2 positional arguments (and 2 keyword-only arguments) were given
Additionaly I need the end results to retain the orginal names along with their new cordinate order, somthing which I dont think my current code does. I've been using the sklearn documentation but have hit a bit of a wall. Help would be appreciated.
coords = list(zip(df['lat'],df['long']))
btree = sklearn.neighbors.NearestNeighbors(coords,algorithm='ball_tree',metric='haversine')
btree.fit(coords)
df['optimised_route']=btree
I have a seperate loop for calculating haversine distance manualy which can be brought in if required.
The comment pointing out that coords
should not be passed as an argument to NearestNeighbors
is correct. Instead, the lat
and long
parameters should be passed as columns in the .fit()
method:
from io import StringIO
from sklearn.neighbors import NearestNeighbors
import pandas as pd
lat_long_file = StringIO("""name,lat,long
Veronica Session,11.463798,14.136215
Lynne Donahoo,44.405370,-82.350737
Debbie Hanley,14.928905,-91.344523
Lisandra Earls,68.951464,-138.976699
Sybil Leef,-1.678356,33.959323
""")
df = pd.read_csv(lat_long_file)
nn = NearestNeighbors(metric="haversine")
nn.fit(df[["lat", "long"]])
Now for a new example at 11.5,15.1
we can query the NearestNeighbors
object for indexes. For example: use it to compute the two-nearest neighbors and look up the resulting indexes nearest[0]
in the original data frame:
new_example = pd.DataFrame({"lat": [11.5], "long": [15.1]})
nearest = nn.kneighbors(new_example, n_neighbors=2, return_distance=False)
print(df.iloc[nearest[0]])
Which shows us that the two closest points are at 11.46,14.13
and -1.6,33.9
:
name lat long
0 Veronica Session 11.463798 14.136215
4 Sybil Leef -1.678356 33.959323