Search code examples
pythonnumpyoptimizationnumerical-methods

How to minimize this distance faster with Numpy? (find shifting-index for which two signals are close to each other)


Given an array x of length 1000, and y of length 500k, we can compute the index k for which x is the closest to "y-shifted by k indices":

mindistance = np.inf  # infinity
for k in range(len(y)-1000):
    t = np.sum(np.power(x-y[k:k+1000],2))
    if t < mindistance:
        mindistance = t
        index = k
 print index
 # x is close to y[index:index+N]

According to my tests, this seems to be numerically costly. Is there a clever numpy way to compute it faster?


Note: It seems that if I replace the length of x from 1000 to 100, it doesn't change much the time taken for the computation. The slowness seems to come mostly from the for k in range(...) loop. How to speed it up?


Solution

  • This can be done with np.correlate which computes not the coefficient of correlation (as one might guess), but simply the sum of products like x[n]*y[m] (here m is n plus some shift). Since

    (x[n] - y[m])**2  = x[n]**2 - 2*x[n]*y[m] + y[m]**2
    

    we can get the sum of squares of differences from this, by adding the sums of squares of x and of a part of y. (Actually, the sum of x[n]**2 will not depend on the shift, since we'll always get just np.sum(x**2), but I'll include it all the same.) The sum of a part of y**2 can also be found in this way, by replacing x with an all-ones array of the same size, and y with y**2. Here is an example.

    import numpy as np
    x = np.array([3.1, 1.2, 4.2])
    y = np.array([8, 5, 3, -2, 3, 1, 4, 5, 7])
    diff_sq = np.sum(x**2) - 2*np.correlate(y, x) + np.correlate(y**2, np.ones_like(x))
    print(diff_sq)
    

    This prints [39.89 45.29 11.69 39.49 0.09 12.89 23.09] which are indeed the required distances from x to various parts of y. Pick the smallest with argmin.