Search code examples
pythonpython-2.7numpydistanceinsertion

Fastest way to compute distances between consecutive vectors with numpy/scipy


I have a line growth algorithm where I need to:

  • calculate the distances (euclidean) between consecutive vectors in an array
  • insert a new vector where a distance is greater than a specific threshold

enter image description here

I usually do this in a very naive manner (see code below) and would like to know how to compute distances between consecutive vectors the fastest way possible using numpy (and scipy if needed).

import math


threshold = 10
vectorList = [(0, 10), (4, 8), (14, 14), (16, 19), (35, 16)]

for i in xrange(len(vectorList)):
    p1 = vectorList[i]
    p2 = vectorList[i+1]
    d = math.sqrt((p2[0] - p1[0])**2 + (p2[1] - p1[1])**2)
    if d >= threshold:
        pmid = ((p1[0] + p2[0]) * .5, (p1[1] + p2[1]) * .5)
        vectorList.insert(i+1, pmid)

EDIT: I've come up with the following workaround but I'm still concerned with the distance computation.

I would need to compute the distances between a vector and its next neighbor in a list instead of calculating a whole distance matrix (all vectors against each others) as I'm doing here.

import numpy as np

vectorList = [(0, 10), (4, 8), (14, 14), (16, 19), (35, 16)]
arr = np.asarray(vectorList).astype(float)


dis = distance.cdist(arr, arr).diagonal(1)
idx = np.where(dis > 10)[0]
vec = (arr[idx] + arr[idx+1]) * .5
arr = np.insert(arr, idx+1, vec, 0)

# output
array([[ 0. , 10. ],[ 4. ,  8. ],[ 9. , 11. ],[14. , 14. ],[16. , 19. ],[25.5, 17.5],[35. , 16. ]])

Solution

  • In [209]: vectorList = [(0, 10), (4, 8), (14, 14), (16, 19), (35, 16), (39,50)] 
    In [210]: vectorList                                                            
    Out[210]: [(0, 10), (4, 8), (14, 14), (16, 19), (35, 16), (39, 50)]
    

    I added a point, making 3 possible insertion points.

    In [212]: np.diff(vectorList, axis=0)                                           
    Out[212]: 
    array([[ 4, -2],
           [10,  6],
           [ 2,  5],
           [19, -3],
           [ 4, 34]])
    In [213]: np.sum(np.diff(vectorList, axis=0)**2,1)                              
    Out[213]: array([  20,  136,   29,  370, 1172])
    

    distances:

    In [214]: np.sqrt(np.sum(np.diff(vectorList, axis=0)**2,1))                     
    Out[214]: array([ 4.47213595, 11.66190379,  5.38516481, 19.23538406, 34.23448554])
    

    Mean values:

    In [216]: arr = np.array(vectorList)                                            
    In [217]: arr                                                                   
    Out[217]: 
    array([[ 0, 10],
           [ 4,  8],
           [14, 14],
           [16, 19],
           [35, 16],
           [39, 50]])
    
    In [218]: (arr[:-1]+arr[1:])/2                                                  
    Out[218]: 
    array([[ 2. ,  9. ],
           [ 9. , 11. ],
           [15. , 16.5],
           [25.5, 17.5],
           [37. , 33. ]])
    

    I could do something similar without diff:

    d = np.sqrt(np.sum((arr[1:]-arr[:-1])**2,1)) 
    

    The jumps that exceed the threshhold:

    In [224]: idx = np.nonzero(d>10)                                                
    In [225]: idx                                                                   
    Out[225]: (array([1, 3, 4]),)
    In [227]: _218[idx]       # the mean values to insert                                                            
    Out[227]: 
    array([[ 9. , 11. ],
           [25.5, 17.5],
           [37. , 33. ]])
    

    Use np.insert to insert all values.

    In [232]: np.insert(arr.astype(float), idx[0]+1, _227, axis=0)                  
    Out[232]: 
    array([[ 0. , 10. ],
           [ 4. ,  8. ],
           [ 9. , 11. ],
           [14. , 14. ],
           [16. , 19. ],
           [25.5, 17.5],
           [35. , 16. ],
           [37. , 33. ],
           [39. , 50. ]])