Search code examples
matlabneural-networkvectorizationbsxfun

How to avoid loops by Vectorizing below code?


The code below is correct, but I want to vectorize it (and may convert to GPU) to increase the speed.

How can I convert it to vector form?

RF = 4;     
inhibatory = 0;    
overlap=3;   
act_funct = 'sig';
gap = RF-overlap;    
Image1 = rand(30,22);  
Image2 = rand(27,19); % size_image2 is equal to 27x19
Image3 = rand(30,22); 
de_act_output = de_activate_Mat(Image1,act_funct); % finding derivative of the matrix. e.g. de_act_output = act_output.*(1-act_output) in case of sigmoid. 
for u=1:size(Image1,1)
    for v=1:size(Image1,2)
        sum_val=0;
        iLowMax=max(ceil((u-(RF+inhibatory))/(gap-inhibatory)),1);
        iHighMax=min(floor((u-1)/(gap-inhibatory))+1, size_image2(1));
        jLowMax=max(ceil((v-(RF+inhibatory))/(gap-inhibatory)),1);
        jHighMax = min(floor((v-1)/(gap-inhibatory))+1, size_image2(2));
        sum_sens = sum(sum(Image2(iLowMax:iHighMax,jLowMax:jHighMax)));
        sum_val = sum_sens(:,:) .* Image3(u,v);
        result(u,v) = de_act_output(u,v) .* sum_val;
    end
end

Solution

  • There is a parallelogram-like structure of blocks you are creating inside the nested loops with iLowMax:iHighMax,jLowMax:jHighMax which won't lead to any easy vectorizable codes. But you can go full-throttle vectorization on that if performance is paramount for your case and seems like convolution would be of good use there. Listing here are some tweaks to make it faster everything around that step by pre-calculating most other stuffs and this must result in appreciable speedup. Here's the implementation -

    U = 1:size(Image1,1); %// Create arrays of iteration steps
    V = 1:size(Image1,2);
    
    %// Calculate arrays of low-high row and column indices 
    iLowMax=max(ceil((U-(RF+inhibatory))/(gap-inhibatory)),1);
    iHighMax=min(floor((U-1)/(gap-inhibatory))+1, size_image2(1));
    
    jLowMax=max(ceil((V-(RF+inhibatory))/(gap-inhibatory)),1);
    jHighMax = min(floor((V-1)/(gap-inhibatory))+1, size_image2(2));
    
    sens_sums(size(Image1,1),size(Image1,2)) = 0; %// Pre-allocation
    for u=1:size(Image1,1)
        for v=1:size(Image1,2)
            sens = Image2(iLowMax(u):iHighMax(u),jLowMax(v):jHighMax(v));
            sens_sums(u,v) = sum(sens(:));
        end
    end
    result = sens_sums.*Image3.*de_act_output;