Search code examples
pythonpandasgpspandas-groupbydistance

Average measure for all GPS points within certain range


I have a pandas dataframe with latitude, longitude, and a measure for 100K+ GPS points.

df = pd.DataFrame({'lat': [41.260637, 45.720185, 45.720189, 45.720214, 45.720227, 46.085716, 46.085718, 46.085728, 46.085730, 46.085732], 
          'lng': [2.825920, 3.068014, 3.068113, 3.067929, 3.068199, 3.341655, 3.341534, 3.341476, 3.341546, 3.341476], 
      'measure': [6.30000, -0.375000, -0.375000, -0.375000, -0.375000, 0.000000, 0.000000, 0.555556, 0.714286, 0.645833]})

What I want to do is calculate, for each of these points, the average of the measure column for all points within a range of 10 meters.

I know how to calculate the distance between two points using geopy

from geopy.distance import distance
distance([gps_points.lat[3], gps_points.lng[3]], [gps_points.lat[4], gps_points.lng[4]]).m

21.06426497936181

But how would I go iterating on rows, selecting points in the 10m range and averaging the measure?

I'm guessing some sort of groupby, but can't figure out how.


Solution

  • In this example, the point itself is always included itself. Making it part of the average itself. You would need to modify that part if you want to exclude the point itself.

    We can use BallTree

    import pandas as pd
    from sklearn.neighbors import BallTree
    import numpy as np
    

    And with your sample data

    df = pd.DataFrame({'lat': [41.260637, 45.720185, 45.720189, 45.720214, 45.720227, 46.085716, 46.085718, 46.085728, 46.085730, 46.085732], 
              'lng': [2.825920, 3.068014, 3.068113, 3.067929, 3.068199, 3.341655, 3.341534, 3.341476, 3.341546, 3.341476], 
          'measure': [6.30000, -0.375000, -0.375000, -0.375000, -0.375000, 0.000000, 0.000000, 0.555556, 0.714286, 0.645833]})
    

    We can create a Tree with

    gps_pairs = df[["lat", "lng"]].values
    radians =  np.radians(gps_pairs)
    
    tree = BallTree(radians, leaf_size=15, metric='haversine')
    

    Now we need to scale to get radius of 10m (approx):

    distance_in_meters = 10
    earth_radius = 6371000
        
    radius = distance_in_meters / earth_radius
    

    Query that radius with

    is_within, distances = tree.query_radius(radians, r=radius, count_only=False, return_distance=True) 
    

    is_within will contain the indices of points that fall within 10 meter.

    Now you can calculate the average measure with:

    measures = df[['measure']].values
    
    average_measure_for_withins = np.array([ np.mean( measures[withins] ) for withins in is_within ])
    

    And for instance add this to the DF

    df['average_for_withins'] = average_measure_for_withins