I have a newbie question guys.
Let's suppose that I have the following simple nested loop, where m
and n
are not necessarily the same, but are really big numbers:
x = 0;
for (i=0; i<m; i++)
{
for (j=0; j<n; j++)
{
delta = CalculateDelta(i,j);
x = x + j + i + delta;
}
}
And now I have this:
x = 0;
for (i=0; i<m; i++)
{
for (j=0; j<n; j++)
{
delta = CalculateDelta(i,j);
x = x + j + i + delta;
j++;
delta = CalculateDelta(i,j);
x = x + j + i + delta;
}
}
Rule: I do need to go through all the elements of the loop, because of this delta calculation.
My questions are:
1) Is the second algorithm faster than the first one, or is it the same? I have this doubt because for me the first algorithm has a complexity of O(m * n) and the second one is O(m * n/2). Or does lower complexity not necessary makes it faster?
2) Is there any other way to make this faster without something like a Parallel. For
?
3) If I make usage of a Parallel. For
, would it really make it faster since I would probably need to do a synchronization lock on the x
variable?
Thanks!
Asking three questions in a single post is pushing the limits of "too broad". But in your case, the questions are reasonably simple, so…
1) Does the second algorithm is faster then the first one, or is it the same? I have this doubt because for me the first algorithm have a complexity of O(m * n) and the second one is O(m * n/2). Or does lower complexity not necessary makes it faster?
Complexity ignores coefficients like 1/2. So there's no difference between O(m * n)
and O(m * n/2)
. The latter should have been reduced to O(m * n)
, which is obviously the same as the former.
And the second isn't really O(m * n/2)
anyway, because you didn't really remove work. You just partially unrolled your loop. These kinds of meaningless transformations are one of the reasons we ignore coefficients in big-O notation in the first place. It's too easy to fiddle with the coefficients without really changing the actual computational work.
2) Is there any other way to make this faster without something like a Parallel.For?
That's definitely too broad a question. "Any other way"? Probably. But you haven't provided enough context.
The only obvious potential improvement I can see in the code you posted is that you are computing j + i
repeatedly, when you could instead just observe that that component of the whole computation increases by 1 with each iteration and so you could keep a separate incrementing variable instead of adding i
and j
each time. But a) it's far from clear making that change would speed anything up (whether it would, would depend a lot on specifics in the CPU's own optimization logic), and b) if it did so reliably, it's possible that a good optimizing JIT compiler would make that transformation to the code for you.
But beyond that, the CalculateDelta()
method is a complete unknown here. It could be a simple on-liner that the compiler inlines for you, or it could be some enormous computation that dominates the whole loop.
There's no way for any of us to tell you if there is "any other way" to make the loop faster. For that matter, it's not even clear that the change you made makes the loop faster. Maybe it did, maybe it didn't.
3) If I make usage of a Parallel.For, would it really make it faster since I would probably need to do a syncronization lock on the x variable?
That at least depends on what CalculateDelta()
is doing. If it's expensive enough, then the synchronization on x
might not matter. The bigger issue is that each calculation of x
depends on the previous one. It's impossible to parallelize the computation, because it's inherently a serial computation.
What you could do is compute all the deltas in parallel, since they don't depend on x
(at least, they don't in the code you posted). The other element of the sum is constant (i + j
for known m
and n
), so in the end it's just the sum of the deltas and that constant. Again, whether this is worth doing depends somewhat on how costly CalculateDelta()
is. The less costly that method is, the less likely you're going to see much if any improvement by parallelizing execution of it.