Search code examples
cudathrust

Performing several 1D moving averages in parallel using CUDA Thrust


I'm not a programmer with any abilities. Just someone curious about CUDA and so I'm doing a little reading. I ran across an example of using Thrust to do a moving average:

Simple Moving Average Thrust Example

The example, such as it is, runs and mostly works correctly. However it's trivial in the sense that it only does one moving average operation.

How I would do say 352 of these moving average operations in parallel, all operating on the same data stream? In my mind the program flow might be:

  1. Generate the data & send it to one CUDA core. (Same as existing code but think lengths of 1000 or 10000 instead of 30)
  2. Copy it from the CUDA core it's in to all of the the other 351 CUDA cores in my GTX 465
  3. Tell each CUDA core what number of data items to average over. (4, 5, 6,..., 352, 353, 354)
  4. Tell the device to run the average in each core in parallel
  5. Read back the results from each core

I get that this code

// compute SMA using standard summation
simple_moving_average(data, w, averages);

makes it all happen, but how to I get Thrust to do many of these in parallel?

My interest here is about something like stock data. If I'm looking at GOOG prices I'd put that in the GPU using all cores and leave it there. I'd then be free to do lots of processing without loading the data anymore and just reading back results from each core. NOTE: I might not want to use GOOG in all cores. Some cores might be GOOG, others with some other symbol, but I'll get there later. I'm just thinking I don't want the stock data in global memory if there's enough room in each core.

I assume this is pretty straightforward for CUDA & Thrust?


Solution

  • Here is the possible way how to do this with arrayfire: Note that I am NOT affiliated with this library whatsoever.
    I am pretty sure this can also be done with thrust but I found this one a lot simpler with arrayfire. And if the library is free why can't I use it instead of thrust ?

    In arrayfire you can use matrix to run several SMA operations in parallel:

    unsigned n_SMAs = 1000;   // # of SMA indicators to evaluate 
    unsigned len = 2000;      // # of stock prices per indicator
    unsigned w = 6; // window size
    
    // generate stock prices: [0..10] 
    af::array data = af::randu(n_SMAs, len) * 10;
    
    // compute inclusive prefix sums along colums of the matrix
    af::array s = af::accum(data, 1);
    
    // compute the average
    af::array avg = (s.cols(w, af::end) - s.cols(0, af::end - w)) / w;
    af::eval(avg);
    
    std::cout << avg.dims() << "\n" << avg << "\n";
    

    let me know if that's what you are looking for. This is how I understood your question: compute several SMA indicators in parallel