I'm calculating with Python. Let's say i have this kind of DataFrame where it consists of long lat of some points
import pandas as pd
dfa=pd.DataFrame(([1,2],[1,3],[1,1],[1,4]), columns=['y','x'])
before, i used distance matrix from scipy.spatial and create another DataFrame with this code. but it seems that it can't precisely calculate the distance between points (with long lat)
from scipy.spatial import distance_matrix
pd.DataFrame(distance_matrix(dfa.values, dfa.values), index=dfa.index, columns=dfa.index)
Do you think it's possible to change the calculation with geodesic? here what i've tried.
from geopy.distance import geodesic
pd.DataFrame(geodesic(dfa.values[0], dfa.values[0]).kilometers, index=dfa.index, columns=dfa.index)
# i don't know how to change [0] adjusted to column and index
any suggestion?
Given a list or list-like object locations
, you can do
distances = pd.DataFrame([[geodesic(a,b) for a in locations]
for b in locations])
This will be redundant, though, since it will calculate distance for both a,b and b,a, even though they should be the same. Depending on the cost of geodesic
, you may find the some of the following alternatives faster:
distances = pd.DataFrame([[geodesic(a,b) if a > b else 0
for a in locations]
for b in locations])
distances = distances.add(distances.T)
size = len(locations)
distances = pd.DataFrame(columns = range(size), index = range(size))
def get_distance(i,j):
if distances.loc[j,i]:
return distances.loc[j,i]
if i == j:
return 0
return geodesic(locations[i], locations[j])
for i in range(size):
for j in range(size):
distances.loc[i,j] = get_distance(i,j)
You can also store the data as a dictionary with the keys being output from itertools.combinations
. There's also this article on creating a symmetric matrix class.