I'm implemented a neural network in MATLAB for better understanding of the topic.
I wanted to run the code on my GPU, so I initialized every matrix with gpuArray()
, but got no performance boost. Moreover, sometimes the GPU is slower than the CPU. I already learned to use functions like arrayfun
, pagefun
and so on.
In backprop I have a for
loop that computes the delta error for every layer, backwards. However, the computation needs the result of the previous computation and I have no idea how to do this with *fun()
functions.
My CPU is a i5-3570, my GPU is a GTX 660 Ti. I already tested GPUBench in MATLAB, GPU is x times faster than CPU, so I think the mistake is in my code.
TL;DR
How do I improve this MATLAB code for GPU computing?
delta_output = (predicted - NN.Y) .* NN.activationGradient(predicted);
delta_hidden(:, :, m) = (delta_output * NN.Theta_output) .* ...
NN.activationGradient(NN.a_hidden(:, :, m));
for i = m-1:-1:1
delta_hidden(:, :, i) = (delta_hidden(:, 2:end, i+1) * ...
NN.Theta_hidden(:, :, i)) .* ...
NN.activationGradient(NN.a_hidden(:, :, i));
end
predicted
, NN.y
, NN.Theta_*
are all gpuArray
. I already initialized delta_*
as a gpuArray
but it doensn't make any difference.
The advantage of using the GPU for neural networks comes not from computing the updates for every layer at once - that's inherently serial, as you point out. It comes from being able to compute the update for the weights on thousands of neurons in each layer at once.
So I suspect that you simply do not have a large enough network to make using the GPU advantageous. What is the size of your weight matrix at each layer? If it doesn't contain at least 1000 elements, you're probably not going to see much advantage over the highly-optimised multi-core and intrinsically-vectorised computation that your CPU is doing.