For a brief report I have to do, our class ran code on a cluster using both gcc -O0 and icc -O0. We found that gcc was about 2.5 times faster than icc without any optimizations? Why is this? Does gcc -O0 actually do some minor optimization or does it simply happen to work better for this system?
The code was an implementation of the naive string searching algorithm found here, written in c.
Thank you
A few things to take into account:
The instruction set each compiler uses by default. For example if your GCC build produces i686 code by default, while ICC restricts itself to i586 opcodes, you would probably see a significant performance difference.
The actual CPUs in your cluster. If you are using AMD processors, instead of Intel CPUs, then ICC is at a disadvantage because it is, of course, targeted specifically to Intel processors.
You mentioned using a cluster. Does this speed difference exist on a single processor as well? If you used any parallelisation facilities provided by your compiler, there could be significant differences there.
Simplistically, when optimisations are disabled, the compiler uses pre-made "templates" for each code construct. Since these templates are intended to be optimised afterwards, they are constructed in a way that enables the optimisation passes to produce better code. The fact that they may be slower or faster with -O0
does not really mean anything - for example, more explicit initial code could be easier to optimise but far slower to execute.
That said, the only way to find out what is going on is to profile the execution of your code and, if necessary, have a look at the assembly of those parts of the code where the major differences lie.