math cuda filtering signal-processing thrust

Implementing an Exponential Moving Average Filter described by a difference equation in CUDA

I'm currently trying to use CUDA to evaluate a difference equation that represents an exponential moving average filter. The filter is described by the following difference equation

y[n] = y[n-1] * beta + alpha * x[n]

where alpha and beta are constants defined as

alpha = (2.0 / (1 + Period))
beta = 1 - alpha

How can the above difference equation be manipulated to get the system response for this filter? What would be an efficient way to implement this filter on the GPU?

I am working on a GTX 570.

Solution

I would propose to manipulate the above difference equation as indicated below and then using CUDA Thrust primitives.

DIFFERENCE EQUATION MANIPULATION - EXPLICIT FORM OF THE DIFFERENCE EQUATION

By simple algebra, you can find the following:

y[1] = beta * y[0] + alpha * x[1]

y[2] = beta^2 * y[0] + alpha * beta * x[1] + alpha * x[2]

y[3] = beta^3 * y[0] + alpha * beta^2 * x[1] + alpha * beta * x[2] + alpha * x[3]

Accordingly, the explicit form is the following:

y[n] / beta^n = y[0] + alpha * x[1] / beta + alpha * x[2] / beta^2 + ...

CUDA THRUST IMPLEMENTATION

You can implement the above explicit form by the following steps:

Initialize an input sequence d_input to alpha except for d_input[0] = 1.;
Define a vector d_1_over_beta_to_the_n equal to 1, 1/beta, 1/beta^2, 1/beta^3, ...;
Multiply elementwise d_input by d_1_over_beta_to_the_n;
Perform an inclusive_scan to obtain the sequence of the y[n] / beta^n;
Divide the above sequence by 1, 1/beta, 1/beta^2, 1/beta^3, ....

EDIT

The above approach can be recommended for Linear Time-Varying (LTV) systems. For Linear Time-Invariant (LTI) systems, the FFT approach mentioned by Paul can be recommended. I'm providing an example of that approach by using CUDA Thrust and cuFFT in my answer to FIR filter in CUDA.

FULL CODE

#include <thrust/sequence.h>

#include <thrust/device_vector.h>
#include <thrust/host_vector.h>

int main(void)
{
    int N = 20;

    // --- Filter parameters
    double alpha    = 2.7;
    double beta     = -0.3;

    // --- Defining and initializing the input vector on the device
    thrust::device_vector<double> d_input(N,alpha * 1.);
    d_input[0] = d_input[0]/alpha;

    // --- Defining the output vector on the device
    thrust::device_vector<double> d_output(d_input);

    // --- Defining the {1/beta^n} sequence
    thrust::device_vector<double> d_1_over_beta(N,1./beta);
    thrust::device_vector<double> d_1_over_beta_to_the_n(N,1./beta);
    thrust::device_vector<double> d_n(N);
    thrust::sequence(d_n.begin(), d_n.end());
    thrust::inclusive_scan(d_1_over_beta.begin(), d_1_over_beta.end(), d_1_over_beta_to_the_n.begin(), thrust::multiplies<double>());
    thrust::transform(d_1_over_beta_to_the_n.begin(), d_1_over_beta_to_the_n.end(), d_input.begin(), d_input.begin(), thrust::multiplies<double>());    
    thrust::inclusive_scan(d_input.begin(), d_input.end(), d_output.begin(), thrust::plus<double>());
    thrust::transform(d_output.begin(), d_output.end(), d_1_over_beta_to_the_n.begin(), d_output.begin(), thrust::divides<double>());

    for (int i=0; i<N; i++) {
        double val = d_output[i];
        printf("Device vector element number %i equal to %f\n",i,val);
    }

    // --- Defining and initializing the input vector on the host
    thrust::host_vector<double> h_input(N,1.);

    // --- Defining the output vector on the host
    thrust::host_vector<double> h_output(h_input);

    h_output[0] = h_input[0];
    for(int i=1; i<N; i++)
    {
        h_output[i] = h_input[i] * alpha + beta * h_output[i-1];
    }

    for (int i=0; i<N; i++) {
        double val = h_output[i];
        printf("Host vector element number %i equal to %f\n",i,val);
    }

    for (int i=0; i<N; i++) {
        double val = h_output[i] - d_output[i];
        printf("Difference between host and device vector element number %i equal to %f\n",i,val);
    }

    getchar();
}