Search code examples
pythonpandasscikit-learnhaversine

Nearest neighbour - sklearn for latitude and longitude


I have a set of names, longditude and latitude cordinates I am trying to run a nearesat neighbour search on.

                            name        lat        long
            0   Veronica Session  11.463798   14.136215
            1      Lynne Donahoo  44.405370  -82.350737
            2      Debbie Hanley  14.928905  -91.344523
            3     Lisandra Earls  68.951464 -138.976699
            4         Sybil Leef  -1.678356   33.959323

Currenlty I am am using sklearn.neighbors to run a search on the data but I recive a type error. The data is being stored in a dataframe.

TypeError: NearestNeighbors.__init__() takes 1 positional argument but 2 positional arguments (and 2 keyword-only arguments) were given

Additionaly I need the end results to retain the orginal names along with their new cordinate order, somthing which I dont think my current code does. I've been using the sklearn documentation but have hit a bit of a wall. Help would be appreciated.

coords = list(zip(df['lat'],df['long']))
btree = sklearn.neighbors.NearestNeighbors(coords,algorithm='ball_tree',metric='haversine')
btree.fit(coords)

df['optimised_route']=btree

I have a seperate loop for calculating haversine distance manualy which can be brought in if required.


Solution

  • The comment pointing out that coords should not be passed as an argument to NearestNeighbors is correct. Instead, the lat and long parameters should be passed as columns in the .fit() method:

    from io import StringIO
    from sklearn.neighbors import NearestNeighbors
    import pandas as pd
    
    lat_long_file = StringIO("""name,lat,long
    Veronica Session,11.463798,14.136215
    Lynne Donahoo,44.405370,-82.350737
    Debbie Hanley,14.928905,-91.344523
    Lisandra Earls,68.951464,-138.976699
    Sybil Leef,-1.678356,33.959323
    """)
    
    df = pd.read_csv(lat_long_file)
    
    nn = NearestNeighbors(metric="haversine")
    nn.fit(df[["lat", "long"]])
    

    Now for a new example at 11.5,15.1 we can query the NearestNeighbors object for indexes. For example: use it to compute the two-nearest neighbors and look up the resulting indexes nearest[0] in the original data frame:

    new_example = pd.DataFrame({"lat": [11.5], "long": [15.1]})
    
    nearest = nn.kneighbors(new_example, n_neighbors=2, return_distance=False)
    
    print(df.iloc[nearest[0]])
    

    Which shows us that the two closest points are at 11.46,14.13 and -1.6,33.9:

                   name        lat       long
    0  Veronica Session  11.463798  14.136215
    4        Sybil Leef  -1.678356  33.959323