Search code examples
pythonloopspython-itertoolseuclidean-distanceenumerate

eucledian distance between lists inside a list


I would like to calculate eucledian distance between lists inside a list and if that distance is smaller than threshold than get maximum element of such lists.

My solution gives me distance between each two lists, but I want to comapere every list with every other. Basically, a two loop solution probably.

yhat = [[10 , 15, 200 ,220], [20 , 25, 200 ,230], [30 , 15, 200 ,230], [100 , 150, 230 ,300], [110 , 150, 240 ,300] ]

def euclidean(v1, v2):
    return sum((p-q)**2 for p, q in zip(v1, v2)) ** .5

it = iter(yhat)
prev = next(it)
ec =[]
for ind, ele in enumerate(it):
    ec.append(euclidean(ele, prev))
    prev = ele
ec

To summarize, I would like a new list xhat which contains elements:

xhat = [[30 , 35, 200 ,230], [110 , 150, 240 ,300] ]

Solution

  • You can use enumerate and itertools.combinations to make this rather short:

    from itertools import combinations
    
    out = defaultdict(lambda: defaultdict(dict))
    for (i, v1), (j, v2) in combinations(enumerate(yhat), 2):
        out.setdefault(i, {})[j] = euclidean(v1, v2)
    
    out
    {0: {1: 17.320508075688775, 2: 22.360679774997898, 3: 183.3712082089225, 4: 190.3286631067428}, 
     1: {2: 14.142135623730951, 3: 166.80827317612278, 4: 173.8533865071371}, 
     2: {3: 170.07351351694948, 4: 176.4227876437735}, 
     3: {4: 14.142135623730951}}
    

    where out maps to indeces in your input list to the distance between the vectors at those indeces. You could get the max elements of the vectors whose distance is smaller than the threshold like:

    for (i, v1), (j, v2) in combinations(enumerate(yhat), 2):
        if euclidean(v1, v2) < threshold:
            out.setdefault(i, {})[j] = (max(v1), max(v2))
    out
    {0: {1: (220, 230), 2: (220, 230), 3: (220, 300), 4: (220, 300)}, 
     1: {2: (230, 230), 3: (230, 300), 4: (230, 300)}, 
     2: {3: (230, 300), 4: (230, 300)}, 
     3: {4: (300, 300)}}