Search code examples
performancematlabparallel-processingbatch-processingparfor

parfor loop for loop and batch processing in Matlab


I want to use parallel computing toolbox in Matlab, I searched a-lot to get knowledge about this, I saw a common query regarding parfor and for loops comparison, that parfor is slow, however, I got the common reason of starting with/without matlabpool/parpool.

Here I concluded the final code, but its still too slow, I couldn't know the reason of this. I'm strange that Mathswork havn't documented this problem very well. Any suggestions for starting PCT especially parfor and Batch Processing by starting and stopping the pool condition, with how many sufficient workers regarding opeartions

My Code:

matlabpool open
tic ; parfor i=1:4, disp(['myid is ' num2str(labindex) '; i = ' num2str(i)]),end 
toc;
tic ; for i=1:4, disp(['myid is ' num2str(labindex) '; i = ' num2str(i)]),end 
toc;
matlabpool close
Starting matlabpool using the 'local' profile ... connected to 2 workers.
myid is 1; i = 2
myid is 1; i = 1
myid is 1; i = 3
myid is 1; i = 4
Elapsed time is 2.974505 seconds.
myid is 1; i = 1
myid is 1; i = 2
myid is 1; i = 3
myid is 1; i = 4
Elapsed time is 0.010254 seconds.
Sending a stop signal to all the workers ... stopped.

EDIT 1:

I've also seen the linearization of nested loop, I checked but its also slow as compare to normal for loop

tic;
for a=1:4
    for b=1:5
        f(a)=sum(a,b);
    end
end
toc;
tic;
iterations=[5,4];
for ix=1:prod(iterations)
    [b,a]=ind2sub(iterations,ix);
    f(a)=sum(a,b);
end
toc;
Elapsed time is 0.013108 seconds.
Elapsed time is 0.017800 seconds.

EDIT 2:

My code which I want to run in parallel.

parfor ii = 1:1000
   p{ii,1}= [2 5 4; 5 4 6;]; %suppose very big matrix
   pp{ii,1}=p{ii,1}*2;
   for jj = 1:100
      p1{jj,1} = p{ii}* pp{ii};
      p2{jj} = p{ii}* pp{ii}*p1{jj};
      p3{jj} = p{ii}* pp{ii}*p1{jj}*p2{ii};
   end
   Data(ii).data=([pp(ii,:),p1{:,1},p2,p3;])' ; %#'
   Data(ii).label=cellfun(@(x) ['label' num2str(ii)] , num2cell(1:length(pp)+length(p1)+length(p2)+length(p3))', 'UniformOutput', false);
end

Solution

  • The time for the parfor is concerning... 3 seconds is very slow. You should try running this code several times to see if the first time is especially long (I would expect to see .1 or .05 seconds or less as compared to a regular for loop).

    However, my main concern is that you really need to do substantial work inside the loop in order to get any kind of speedup from parfor. Consider doing a very large fft(...) or matrix decomposition that takes ~0.5-3.0 seconds each. Then you should start to see some improvement. An improvement of 1.5-1.75 would be nice with 2 workers.

    You also might consider limiting the use of disp inside parfor to avoid possible competition for the console.

    Finally, when doing benchmarks be sure and not run a lot of other programs while you work. I've noticed that some web pages can substantially slow down benchmark performance by requiring quite a bit of cpu. Overall, though, these times are really, really slow---but especially the parfor.