I have a small set and a large set of locations and I need to know the geographic distance between the locations in these sets. An example of my datasets (they have the same structure, but one is larger):
location lat long
0 Gieten 53.003312 6.763908
1 Godlinze 53.372605 6.814674
2 Grijpskerk 53.263894 6.306134
3 Groningen 53.219065 6.568008
In order to calculate the distances, I am using the haversine library. The haversine function wants the input to look like this:
lyon = (45.7597, 4.8422) # (lat, lon)
london = (51.509865, -0.118092)
paris = (48.8567, 2.3508)
new_york = (40.7033962, -74.2351462)
haversine_vector([lyon, london], [paris, new_york], Unit.KILOMETERS, comb=True)
after which the output looks like this:
array([[ 392.21725956, 343.37455271],
[6163.43638211, 5586.48447423]])
How do I get the function to calculate a distance matrix with my two datasets without adding all the locations separately? I have tried using dictionaries and I have tried looping over the locations in both datasets, but I can't seem to figure it out. I am pretty new to python, so if someone has a solution that is easy to understand but not very elegant I would prefer that over lambda functions and such. Thanks!
You are on the right track using haversine.haversine_vector
.
Since I'm not sure how you got your dataset, this is a self-contained example using CSV datasets, but so long as you get lists of cities and coordinates somehow, you should be able to work it out.
Note that this does not compute distances between cities in the same array (e.g. not Helsinki <-> Turku) – if you want that too, you could concatenate your two datasets into one and pass it to haversine_vector
twice.
import csv
import haversine
def read_csv_data(csv_data):
cities = []
locations = []
for (city, lat, lng) in csv.reader(csv_data.strip().splitlines(), delimiter=";"):
cities.append(city)
locations.append((float(lat), float(lng)))
return cities, locations
cities1, locations1 = read_csv_data(
"""
Gieten;53.003312;6.763908
Godlinze;53.372605;6.814674
Grijpskerk;53.263894;6.306134
Groningen;53.219065;6.568008
"""
)
cities2, locations2 = read_csv_data(
"""
Turku;60.45;22.266667
Helsinki;60.170833;24.9375
"""
)
distance_matrix = haversine.haversine_vector(locations1, locations2, comb=True)
distances = {}
for y, city2 in enumerate(cities2):
for x, city1 in enumerate(cities1):
distances[city1, city2] = distance_matrix[y, x]
print(distances)
This prints out e.g.
{
("Gieten", "Turku"): 1251.501257597515,
("Godlinze", "Turku"): 1219.2012174066822,
("Grijpskerk", "Turku"): 1251.3232414412073,
("Groningen", "Turku"): 1242.8700308545722,
("Gieten", "Helsinki"): 1361.4575055586013,
("Godlinze", "Helsinki"): 1331.2811273683897,
("Grijpskerk", "Helsinki"): 1364.5464743878606,
("Groningen", "Helsinki"): 1354.8847270142198,
}