I have been working on logits for a while for which I need to use the built-in exp() function. This seems to be running slower for a large matrix compared to using a for loop with smaller chunks of the same matrix.
In Matlab documentation and several other forms, it is always recommended to vectorize the code in order to speed it up. But this doesn't seem to be the case here.
n = 32;
rows = 50000;
cols = 32;
a = rand(n*rows, cols);
b = rand(rows, cols);
% in a loop
tic
for i=1:n
d = exp(b);
end
toc
% big matrix
tic
d = exp(a);
toc
I expected the first tic-toc to be slower than the second. But the output I got was as follows:
Elapsed time is 0.335781 seconds.
Elapsed time is 0.390191 seconds.
Any idea as to why this is the case would be helpful.
Edit 1 Let's say I edit my code like this so as to use random values every time:
n = 32;
rows = 50000;
cols = 32;
% in a loop
tic
for i=1:n
d = exp(rand(rows, cols));
end
toc
% big matrix
tic
e = exp(rand(n*rows, cols));
toc
return
I still get:
Elapsed time is 0.745808 seconds.
Elapsed time is 0.847162 seconds.
You are not making a fair comparison: your "loop" case uses much less memory, reading the same smaller array many times, compared to the "big matrix" case that reads a much larger chunk of memory, and writes to a much larger chunk of memory. Reading from main memory is a bottleneck, and so being able to utilize the cache helps speed up the "loop" code.
This is a fairer comparison:
rows = 32*50000;
cols = 32;
a = rand(rows, cols);
% in a loop
tic
d = zeros(size(a));
for i=1:cols
d(:,i) = exp(a(:,i));
end
toc
% big matrix
tic
d = exp(a);
toc
I see (MATLAB R2019a Online):
Elapsed time is 0.208699 seconds.
Elapsed time is 0.140489 seconds.
So the loop is slower. But it is not all that much slower. Over the last 15 years or so MATLAB has been steadily improving their JIT. Before they introduced the JIT, the loop code would easily be ~100 times slower.
NB: Always make sure to run a timing script at least twice, discard the first timings. The first time you run some code, the JIT needs to compile it, increasing the measured timing.
Regarding Edit 1:
Here you are still dealing with smaller arrays in the "loop" case. Not much changes, except that you've moved the rand
call inside the loop. Temporary memory is re-used.