Matlab Parfor much slower than for - even with ridiculously parallel program

I compared the following codes. Serial:

N = 500;
M = rand(500,500,N);
R = zeros(500,500,N);
tic
for k = 1:N
    R(:,:,k) = inv(M(:,:,k));
end
toc

Parallel:

N = 500;
M = rand(500,500,N);
R = zeros(500,500,N);
tic
parfor k = 1:N
    R(:,:,k) = inv(M(:,:,k));
end
toc

I get that the serial time is 3 times shorter than parallel time - though I have 4 available local cores that seem to be in use. Any thoughts on why is it happening?

Solution

Remember that many MATLAB operations (especially large linear algebra operations) are intrinsically multi-threaded. In this case, inv is multi-threaded, and is the dominant factor in your for loop. When you convert that to a parfor loop, if you only have the 'local' cluster type available, then you have no more computational cores available in parfor than you did in for. Therefore, the parfor loop simply must be slower than the for loop because it has to transmit data to the workers for them to operate on.

In general, if you have only 'local' workers available, then parfor can beat for only when MATLAB cannot multi-thread the body of the for loop.