Search code examples
matlabfftaverageconvolutionlarge-data

Extremely large weighted average


I am using 64 bit matlab with 32g of RAM (just so you know).

I have a file (vector) of 1.3 million numbers (integers). I want to make another vector of the same length, where each point is a weighted average of the entire first vector, weighted by the inverse distance from that position (actually it's position ^-0.1, not ^-1, but for example purposes). I can't use matlab's 'filter' function, because it can only average things before the current point, right? To explain more clearly, here's an example of 3 elements

data = [ 2 6 9 ]
weights = [ 1 1/2 1/3; 1/2 1 1/2; 1/3 1/2 1 ]
results=data*weights= [ 8 11.5 12.666 ]
i.e.
8 = 2*1 + 6*1/2 + 9*1/3
11.5 = 2*1/2 + 6*1 + 9*1/2
12.666 = 2*1/3 + 6*1/2 + 9*1

So each point in the new vector is the weighted average of the entire first vector, weighting by 1/(distance from that position+1).

I could just remake the weight vector for each point, then calculate the results vector element by element, but this requires 1.3 million iterations of a for loop, each of which contains 1.3million multiplications. I would rather use straight matrix multiplication, multiplying a 1x1.3mil by a 1.3milx1.3mil, which works in theory, but I can't load a matrix that large.

I am then trying to make the matrix using a shell script and index it in matlab so only the relevant column of the matrix is called at a time, but that is also taking a very long time.

I don't have to do this in matlab, so any advice people have about utilizing such large numbers and getting averages would be appreciated. Since I am using a weight of ^-0.1, and not ^-1, it does not drop off that fast - the millionth point is still weighted at 0.25 compared to the original points weighting of 1, so I can't just cut it off as it gets big either.

Hope this was clear enough?

Here is the code for the answer below (so it can be formatted?):

data = load('/Users/mmanary/Documents/test/insertion.txt');
data=data.';
total=length(data);
x=1:total;
datapad=[zeros(1,total) data];
weights = ([(total+1):-1:2 1:total]).^(-.4);
weights = weights/sum(weights);
Fdata = fft(datapad);
Fweights = fft(weights);
Fresults = Fdata .* Fweights;
results = ifft(Fresults);
results = results(1:total);
plot(x,results)

Solution

  • The only sensible way to do this is with FFT convolution, as underpins the filter function and similar. It is very easy to do manually:

    % Simulate some data
    n = 10^6;
    x = randi(10,1,n);
    xpad = [zeros(1,n) x];
    
    % Setup smoothing kernel
    k = 1 ./ [(n+1):-1:2 1:n];
    
    % FFT convolution
    Fx = fft(xpad);
    Fk = fft(k);
    
    Fxk = Fx .* Fk;
    
    xk = ifft(Fxk);
    xk = xk(1:n);
    

    Takes less than half a second for n=10^6!