Search code examples
pandasgisgeopy

Efficient way to do spatial anaylsis with pandas


I am running into problems doing spatial analysis with a Panda's DataFrame. Right now I have a DataFrame with > 1000 rows and the columns "user", "latitude", "longitude".

Based on this dataset I would like to do some spatial analysis such as creating a fourth column which sums up all users that are within a 100km range.

Is there any way to do this efficiently?

Right now I use two for loops and geopy to calculate the distance in the following way:

df_geo['Neighbors'] = 0

def getNeighbors():
    for i in df_geo.index:
        p1 = (df_geo.ix[i]['latitude'], df_geo.ix[i]['longitude'])
        count = 0
        for i2 in df_geo.index:
            p2 = Point (df_geo.ix[i2]['latitude'], df_geo.ix[i2]['longitude'])
            if geopy.distance.distance(p1, p2).km < 100 & i != i2: 
                count += 1
        df_geo.Neighbors[i] = count



getNeighbors()

Thank you

Andy


Solution

  • I think I would make a column for the Point objects:

    df['point'] = df.apply(lambda row: Point(row['latitude'], row['longitude']))
    

    Then do something like:

    def neighbours_of(p, s):
        '''count points in s within 100km radius of p'''
        return s.apply(lambda p1: geopy.distance.distance(p, p1).km < 100).count()
    
    df['neighbours'] = df['points'].apply(lambda p: neighbours_of(p, df['points']) - 1)
    # the -1 ensures we don't include p in the count
    

    However an apply within an apply still isn't going to be particularly efficient...