I am working with an array created from a list of geographical coordinates describing a GPS trajectory. The data is like this:
[[-51.203018 -29.996149]
[-51.203018 -29.99625 ]
[-51.20266 -29.996229]
...,
[-51.64315 -29.717896]
[-51.643112 -29.717737]
[-51.642937 -29.717709]]
I want to calculate the geographic distances between rows (with the special condition that the first element is always zero, at the starting point). This would give me either a list of distances with len(distances) == coord_array.shape[1]
, or maybe a third column in the same array.
It is important to note that I have already have a function that returns a distance between two points (two coordinate pairs), but I don't know how to apply it with a single array operation instead of looping through row pairs.
Currently I am doing the below to calculate segment distances in one new column, and cumulative distances in another new column (latlonarray
is already shown above and distance(p1, p2)
is already defined):
dists = [0.0]
for n in xrange(len(lonlat)-1):
dists.append(distance(lonlat[n+1], lonlat[n]))
lonlatarray = numpy.array(lonlat).reshape((-1,2))
distsarray = numpy.array(dists).reshape((-1,1))
cumdistsarray = numpy.cumsum(distsarray).reshape((-1,1))
print numpy.hstack((lonlatarray, distsarray, cumdistsarray))
[[ -51.203018 -29.996149 0. 0. ]
[ -51.203018 -29.99625 7.04461338 7.04461338]
[ -51.20266 -29.996229 39.87928578 46.92389917]
...,
[ -51.64315 -29.717896 11.11669769 92529.72742791]
[ -51.643112 -29.717737 11.77016407 92541.49759198]
[ -51.642937 -29.717709 19.57670066 92561.07429263]]
My main question is: "How could I perform the distance function (which takes a pair of rows as argument) like an array operation instead of a loop?" (that is, how could I properly vectorize it)
Other on-topic questions would be:
scipy.spatial.distance
to "work for me" using geographic distance (haversine, great-circle distance)?Also, I would appreciate some tips if I am doing anything unnecessarily complicated.
Thank you all, very much, for your interest.
It sounds like you need to have your original data lonlat
represented as a pair of numpy arrays, then pass these arrays to a version of the function distance
which accepts arrays.
For example, looking up the definition of haversine distance, you can fairly easily turn it into a vectorised formula as follows:
def haversine_pairwise(phi, lam):
dphi = phi[1:]-phi[:-1]
dlam = lam[1:]-lam[:-1]
# r is assumed to be a known constant
return r*(0.5*(1-cos(dphi)) + cos(phi[1:])*cos(phi[:-1])*0.5*(1-cos(dlam)))
I'm not familiar with these formulas myself, but hopefully this shows you how you can do it for whichever formula you want. You would then use cumsum
as you have already done. The array slicing syntax which I have used is documented here in case it's not clear.