I have been trying to write a function in that will take a histogram of a vector using the accelerate library. I recognize that histograms aren't the idea case for GPU processing, but I'm generating a fairly large dataset from a small seed and it would be nice if it could be reduced to a few kilobyte array before transferring it back to main memory.
The code that I've come up with is below. It takes a number of output bins then then creates a new array where the values of a[x] is the number of occurrences of x in xs
hist :: A.Exp Int -> A.Acc (A.Vector Int) -> A.Acc (A.Vector Int)
hist bins xs = A.permute
(const (+1))
(A.fill (A.index1 bins) 0)
(A.index1 . (xs A.!))
The code appears to run properly under the Accelerate interpreter. However, if I try to call it through accelerate-cuda, I get the following error message.
./Data/Array/Accelerate/CUDA/State.hs:85:9: (unhandled): CUDA Exception: unspecified launch failure
My question is two-fold. First, what am I doing that causes CUDA to fail? Second, is there a better way to take a histogram through Accelerate?
This was a bug in Accelerate (and/or underlying change in CUDA) which has now been fixed. Apologies for taking so long to get to it, this slipped off my radar.