Search code examples
algorithmslidingleast-squaresmoving-averagerolling-computation

Algorithm for calculating the sum-of-squares distance of a rolling window from a given line function


Given a line function y = a*x + b (a and b are previously known constants), it is easy to calculate the sum-of-squares distance between the line and a window of samples (1, Y1), (2, Y2), ..., (n, Yn) (where Y1 is the oldest sample and Yn is the newest):

sum((Yx - (a*x + b))^2 for x in 1,...,n)

I need a fast algorithm for calculating this value for a rolling window (of length n) - I cannot rescan all the samples in the window every time a new sample arrives.
Obviously, some state should be saved and updated for every new sample that enters the window and every old sample leaves the window.
Notice that when a sample leaves the window, the indecies of the rest of the samples change as well - every Yx becomes Y(x-1). Therefore when a sample leaves the window, every other sample in the window contribute a different value to the new sum: (Yx - (a*(x-1) + b))^2 instead of (Yx - (a*x + b))^2.

Is there a known algorithm for calculating this? If not, can you think of one? (It is ok to have some mistakes due to first-order linear approximations).


Solution

  • If you expand the term (Yx - (a*x + b))^2 the terms break into three parts:

    1. Terms of only a,x and b. These produce some constant when summed over n and can be ignored.
    2. Terms of only Yx and b. These can be handled in the style of a boxcar integrator as @Xion described.
    3. One term of -2*Yx*a*x. The -2*a is a constant so ignore that part. Consider the partial sum S = Y1*1 + Y2*2 + Y3*3 ... Yn*n. Given Y1 and a running sum R = Y1 + Y2 + ... + Yn you can find S - R which eliminates Y1*1 and reduces each of the other terms, leaving you with Y2*1 + Y3*2 + ... + Yn*(n-1). Now update the running sum R as for (2) by subtracting off Y1 and adding Y(n+1). Add the new Yn*n term to S.

    Now just add up all those partial terms.