optimize Octave code for Zhang Suen thinning algorithm

I would like to know if it is possible to vectorize (cf page 599 of this doc) operations that need a whole matrix scan, but with a lot of conditions to check concerning neighbours pixels. The goal is to make it faster, because the code is working when I use a for loop with 10 iterations, but I tried using a while loop, and it never ends. It might also be me screwing the stop condition, but I think the operations could still be faster. The operations I want to optimize are described here. Here is the code that I want to optimize:

    while (stopCond>0)
      
      stopCond = 0;
      
      ap1 = 0;
      bp1 = 0;
      tabPixel = [];
      for x=2:NL-1
          for y= 2:NC-1
            p1 = imgSeuil(x,y);  %current pixel
            p2 = imgSeuil(x-1, y); %pixel neighbours
            p3 = imgSeuil(x-1, y+1);
            p4 = imgSeuil(x, y+1);
            p5 = imgSeuil(x+1, y+1);
            p6 = imgSeuil(x+1, y);
            p7 = imgSeuil(x+1, y-1);
            p8 = imgSeuil(x, y-1);
            p9 = imgSeuil(x-1, y-1);
            
            tabNeighbour = [p2, p3, p4, p5, p6, p7, p8, p9];
            tmpTabl = diff([tabNeighbour, p2]);
            tmpTabl = max(tmpTabl, 0);
            
            ap1 = sum(tmpTabl);
            bp1 = sum(tabNeighbour);
    %%%--------Can I vectorize ops with below conditions--------
            if((p1==1)&&(bp1>=2)&&(bp1<=6)&&(ap1==1)&&
              ((p2==0)||(p4==0)||(p6==0))&&
              ((p4==0)||(p6==0)||(p8==0)))
%%%adding indexes of current pxl matching these conditions to then change their value when loop is over
                tabPixel = [tabPixel, [x; y]];
                stopCond += 1;
              endif
          endfor
      endfor
    
      for i=2:columns(tabPixel)
        imgSeuil(tabPixel(1, i), tabPixel(2, i)) = 0;
      endfor

I also read that using bolean indexing is encouraged, and I would like to know if those changes would significantly improve exec time.

Here is the whole code if you want to run it :

clear all;
close all;

img=imread("/home/redouane/Documents/L3/S6/TIA/TD/ED_3_6_originale.png");
imshow(img);
colorbar();

sizeImg = size(img);
NL=sizeImg(1,1);
NC=sizeImg(1,2);

tab=zeros(2,256); %tab de niveaux de gris
tab(1,1:256)=0:255;

%remplissage tab niveaux de gris
%et affichage de l'histogramme
for y=1:NL
    for x= 1:NC
        val=img(y,x);
        tab(2,val+1)=tab(2,val+1)+1;
    end
end

ticktab=zeros(1,25);

for i=1:25
    ticktab(1, i)=10*i;
end

figure(2);
plot(tab(1, 1:256),tab(2, 1:256));
set(gca,'XTick',ticktab(1, 1:25));
xlim([0, 255]);
%seuillage de l'img
figure(3);
imgSeuil=img;
for y=1:NL
    for x= 1:NC
        val=imgSeuil(y,x);
        if(val<30)
            imgSeuil(y,x)=0;
        else
            imgSeuil(y,x)=255;
        end
    end
end
imgSeuil=~imgSeuil;%inversion pour lignes blanches
imshow(imgSeuil);

%squelettisation: cf Zhang Suen algorithm sur rosetta code
%%%--------------

stopCond = 1;

while (stopCond>0)
  
  stopCond = 0;
  
  ap1 = 0;
  bp1 = 0;
  tabPixel = [];
  for x=2:NL-1
      for y= 2:NC-1
        p1 = imgSeuil(x,y);  %les voisins du pixel
        p2 = imgSeuil(x-1, y);
        p3 = imgSeuil(x-1, y+1);
        p4 = imgSeuil(x, y+1);
        p5 = imgSeuil(x+1, y+1);
        p6 = imgSeuil(x+1, y);
        p7 = imgSeuil(x+1, y-1);
        p8 = imgSeuil(x, y-1);
        p9 = imgSeuil(x-1, y-1);
        
        tabNeighbour = [p2, p3, p4, p5, p6, p7, p8, p9];
        tmpTabl = diff([tabNeighbour, p2]);
        tmpTabl = max(tmpTabl, 0);
        
        ap1 = sum(tmpTabl);
        bp1 = sum(tabNeighbour);

        if((p1==1)&&(bp1>=2)&&(bp1<=6)&&(ap1==1)&&
          ((p2==0)||(p4==0)||(p6==0))&&
          ((p4==0)||(p6==0)||(p8==0)))
            tabPixel = [tabPixel, [x; y]];
            stopCond += 1;
          endif
      endfor
  endfor

  for i=2:columns(tabPixel)
    imgSeuil(tabPixel(1, i), tabPixel(2, i)) = 0;
  endfor

  ap1 = 0;
  bp1 = 0;
  tabPixel = [];

  for x=2:NL-1
      for y= 2:NC-1
        p1 = imgSeuil(x,y);
        p2 = imgSeuil(x-1, y);
        p3 = imgSeuil(x-1, y+1);
        p4 = imgSeuil(x, y+1);
        p5 = imgSeuil(x+1, y+1);
        p6 = imgSeuil(x+1, y);
        p7 = imgSeuil(x+1, y-1);
        p8 = imgSeuil(x, y-1);
        p9 = imgSeuil(x-1, y-1);
        tabNeighbour = [p2, p3, p4, p5, p6, p7, p8, p9];
        ap1 = sum(diff([tabNeighbour, p2]));
        bp1=sum(tabNeighbour);
          if((p1==1)&&(bp1>=2)&&(bp1<=6)&&(ap1==1)&&
          ((p2==0)||(p4==0)||(p8==0))&&
          ((p2==0)||(p6==0)||(p8==0)))
            tabPixel=[tabPixel, x; y];
            stopCond += 1;
          endif
      endfor
  endfor

  for i=1:columns(tabPixel)
    imgSeuil(tabPixel(1, i), tabPixel(2, i))=0;
  endfor
endwhile


figure(4);
imshow(imgSeuil);

%%%-------------

##tabSquel=zeros(1,10);
##hold on;
##for y=2:NL-1
##        for x= 2:NC-1
##            % on utilise ces valeurs pour ne pas acceder aux bords de l'image
##            %Pixel1 (P1) correspond à imgSeuil(y,x), c'est le pixel du milieu et on etudie ses voisins
##            A=0;%nombre de transi de 0 à 1
##            B=0;%nombre de voisins
##            
##            tabSquel(1,2)=imgSeuil(y,x);%p1
##            tabSquel(1,2)=imgSeuil(y-1,x);%p2
##            tabSquel(1,3)=imgSeuil(y-1,x+1);%p3
##            tabSquel(1,4)=imgSeuil(y,x+1);%p4
##            tabSquel(1,5)=imgSeuil(y+1,x+1);%p5
##            tabSquel(1,6)=imgSeuil(y+1,x);%p6
##            tabSquel(1,7)=imgSeuil(y+1,x-1);%p7
##            tabSquel(1,8)=imgSeuil(y,x-1);%p8
##            tabSquel(1,9)=imgSeuil(y-1,x-1);%p9
##            tabSquel(1,10)=imgSeuil(y-1,x);
##
##
##sum = (0.5*(abs(tabSquel(1,6)-tabSquel(1,1)) + abs(tabSquel(1,7)-tabSquel(1,6)) + abs(tabSquel(1,8)-tabSquel(1,7)) + abs(tabSquel(1,9)-tabSquel(1,8)) + abs(tabSquel(1,2)-tabSquel(1,9)) + abs(tabSquel(1,3)-tabSquel(1,2)) + abs(tabSquel(1,4)-tabSquel(1,3)) + abs(tabSquel(1,5)-tabSquel(1,4))));
##          if(sum==3)
##              plot(y,x, "ro-");
##          end
##    end
##end
%imshow(imgSeuil);

Solution

I decided to try out the idea of generating a lookup table for each of the 256 possible neighbor combinations. Given a kernel:

encoding_kernel =
     1   128    64
     2     0    32
     4     8    16

we can use filter2 to encode each neighborhood of 8 pixels to an integer in the range 0..255.

Now we just need to determine which bit combinations satisfy the conditions. You'll notice that P1 is multiplied by 0 in the kernel. We'll just use the thresholded image directly to get the value of P1. I've only coded the LUT for Step 1 in the algorithm, but Step 2 is virtually identical.

% generate Step 1 lookup table for encoded neighborhood of P1
%
% encoding_kernel
%    1   128    64
%    2     0    32
%    4     8    16
    
LUT_Step1 = ones(256, 1);

% generate binary values for keys 0..255
% pixel ordering is: [P2 P3 P4 P5 P6 P7 P8 P9]
binary_keys = dec2bin(0:255, 8)-'0';   % -'0' makes the array numeric

% LUT = LUT AND (2 <= B(P1) <= 6)
B_P1 = sum(binary_keys, 2);
LUT_Step1 = LUT_Step1 & (2 <= B_P1) & (B_P1 <= 6);

% Generate A(P1) by finding the transitions from 0 to 1
% which corresponds to a value of 1 in the diff
A_P1 = sum(diff([binary_keys, binary_keys(:,1)], [], 2) == 1, 2) == 1;
LUT_Step1 = LUT_Step1 & (A_P1 == 1);

% At least one of P2 and P4 and P6 is white (0)
LUT_Step1 = LUT_Step1 & ~all(binary_keys(:,[1,3,5]), 2);

% At least one of P4 and P6 and P8 is white (0)
LUT_Step1 = LUT_Step1 & ~all(binary_keys(:,[3,5,7]), 2);

I haven't manually verified all of the values in the LUT, but from a spot check it seems to be correct. On my machine, a 3GHz Core i5, generating the table took about 0.7-0.8 msec. You could, of course, hardcode the resulting table in your script if you wished.

Once you have the lookup table and the kernel, checking the conditions is pretty easy. Just encode the current image, and then look up the encoded value in the table and if it's 1, then the image should be 0 at that location. (We don't really need to make sure that the image is 1 (black) before applying the other conditions, because changing 0 to 0 doesn't change the outcome.)

clear all;
close all;

% generate the lookup table
zs_lookup_table;

img=imread("https://i.sstatic.net/ejNSg.png");
imshow(img);
colorbar();

% threshold img
figure(2)
imgSeuil = img < 30;
imshow(imgSeuil);

encoding_kernel = [
   1   128    64
   2     0    32
   4     8    16
   ];


% use "valid" in filter2 to ensure each pixel has 8 neighbors
% valid region of image is 1 pixel smaller on each side,
% so we'll need to adjust when we recalculate imgSeuil
encoded_img = filter2(encoding_kernel, imgSeuil, "valid");
% use lookup table to determine which pixels satisfy conditions 1-4
conds1_4 = false(size(imgSeuil));
conds1_4(2:end-1, 2:end-1) = LUT_Step1(encoded_img+1);  % convert range to 1..256
imgSeuil(conds1_4) = 0;  % no need to explicitly check for pixel == 1

figure(3)
imshow(imgSeuil);

The timings for each pass through Step 1 varied a bit more then generating the table, probably because I updated the image between iterations and there were fewer pixels changing in later iterations. Each pass took between 0.7 and 1.7 msec. Again, I didn't code Step 2 and I didn't check for changes between iterations, but adding it all together you should be able to reach equilibrium in well under a second.