python pandas gps pandas-groupby distance

Average measure for all GPS points within certain range

I have a pandas dataframe with latitude, longitude, and a measure for 100K+ GPS points.

df = pd.DataFrame({'lat': [41.260637, 45.720185, 45.720189, 45.720214, 45.720227, 46.085716, 46.085718, 46.085728, 46.085730, 46.085732], 
          'lng': [2.825920, 3.068014, 3.068113, 3.067929, 3.068199, 3.341655, 3.341534, 3.341476, 3.341546, 3.341476], 
      'measure': [6.30000, -0.375000, -0.375000, -0.375000, -0.375000, 0.000000, 0.000000, 0.555556, 0.714286, 0.645833]})

What I want to do is calculate, for each of these points, the average of the measure column for all points within a range of 10 meters.

I know how to calculate the distance between two points using geopy

from geopy.distance import distance
distance([gps_points.lat[3], gps_points.lng[3]], [gps_points.lat[4], gps_points.lng[4]]).m

21.06426497936181

But how would I go iterating on rows, selecting points in the 10m range and averaging the measure?

I'm guessing some sort of groupby, but can't figure out how.

Solution

In this example, the point itself is always included itself. Making it part of the average itself. You would need to modify that part if you want to exclude the point itself.

We can use BallTree

import pandas as pd
from sklearn.neighbors import BallTree
import numpy as np

And with your sample data

df = pd.DataFrame({'lat': [41.260637, 45.720185, 45.720189, 45.720214, 45.720227, 46.085716, 46.085718, 46.085728, 46.085730, 46.085732], 
          'lng': [2.825920, 3.068014, 3.068113, 3.067929, 3.068199, 3.341655, 3.341534, 3.341476, 3.341546, 3.341476], 
      'measure': [6.30000, -0.375000, -0.375000, -0.375000, -0.375000, 0.000000, 0.000000, 0.555556, 0.714286, 0.645833]})

We can create a Tree with

gps_pairs = df[["lat", "lng"]].values
radians =  np.radians(gps_pairs)

tree = BallTree(radians, leaf_size=15, metric='haversine')

Now we need to scale to get radius of 10m (approx):

distance_in_meters = 10
earth_radius = 6371000
    
radius = distance_in_meters / earth_radius

Query that radius with

is_within, distances = tree.query_radius(radians, r=radius, count_only=False, return_distance=True)

is_within will contain the indices of points that fall within 10 meter.

Now you can calculate the average measure with:

measures = df[['measure']].values

average_measure_for_withins = np.array([ np.mean( measures[withins] ) for withins in is_within ])

And for instance add this to the DF

df['average_for_withins'] = average_measure_for_withins