The code below is correct, but I want to vectorize it (and may convert to GPU) to increase the speed.
How can I convert it to vector form?
RF = 4;
inhibatory = 0;
overlap=3;
act_funct = 'sig';
gap = RF-overlap;
Image1 = rand(30,22);
Image2 = rand(27,19); % size_image2 is equal to 27x19
Image3 = rand(30,22);
de_act_output = de_activate_Mat(Image1,act_funct); % finding derivative of the matrix. e.g. de_act_output = act_output.*(1-act_output) in case of sigmoid.
for u=1:size(Image1,1)
for v=1:size(Image1,2)
sum_val=0;
iLowMax=max(ceil((u-(RF+inhibatory))/(gap-inhibatory)),1);
iHighMax=min(floor((u-1)/(gap-inhibatory))+1, size_image2(1));
jLowMax=max(ceil((v-(RF+inhibatory))/(gap-inhibatory)),1);
jHighMax = min(floor((v-1)/(gap-inhibatory))+1, size_image2(2));
sum_sens = sum(sum(Image2(iLowMax:iHighMax,jLowMax:jHighMax)));
sum_val = sum_sens(:,:) .* Image3(u,v);
result(u,v) = de_act_output(u,v) .* sum_val;
end
end
There is a parallelogram-like
structure of blocks you are creating inside the nested loops with iLowMax:iHighMax,jLowMax:jHighMax
which won't lead
to any easy vectorizable codes. But you can go full-throttle vectorization on that if performance is paramount for your case and seems like convolution
would be of good use there. Listing here are some tweaks to
make it faster everything around that step by pre-calculating most other stuffs and this must result in appreciable speedup. Here's the implementation -
U = 1:size(Image1,1); %// Create arrays of iteration steps
V = 1:size(Image1,2);
%// Calculate arrays of low-high row and column indices
iLowMax=max(ceil((U-(RF+inhibatory))/(gap-inhibatory)),1);
iHighMax=min(floor((U-1)/(gap-inhibatory))+1, size_image2(1));
jLowMax=max(ceil((V-(RF+inhibatory))/(gap-inhibatory)),1);
jHighMax = min(floor((V-1)/(gap-inhibatory))+1, size_image2(2));
sens_sums(size(Image1,1),size(Image1,2)) = 0; %// Pre-allocation
for u=1:size(Image1,1)
for v=1:size(Image1,2)
sens = Image2(iLowMax(u):iHighMax(u),jLowMax(v):jHighMax(v));
sens_sums(u,v) = sum(sens(:));
end
end
result = sens_sums.*Image3.*de_act_output;