using -O2 decreases time of bubble sort C program

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

void sort();

int main() {
    int i;
    for (i = 0; i < 100000; i++) {
        sort();
    }
}

void sort() {
    int i, j, k, array[100], l = 99, m;
    for (i = 0; i < 100; i++) {
        array[i] = rand() % 1000 + 1;
    }
    for (k = 0; k < 99; k++) {
        for (j = 0; j < l; j++) {
            if (array[j + 1] > array[j]) {
                int temp = array[j];
                array[j] = array[j + 1];
                array[j + 1] = temp;
            }
        }
        l--;
    }
    for (m = 0; m < 100; m++) {
        printf("%d ", array[m]);
    }
}

On the linux shell, gcc sort -o sort.c and then time ./sort >> out. Here if I do gcc -o2 sort -o sort.c and similarly o3 and o4 then the time keeps on decreasing. How does the optimization options work? Please explain in terms of all real time, user time and system time.

PS: The code might be a little inefficient. Kindly ignore that.

Solution

Optimization options work between the reading of the source code and the writing of the binary instructions to the CPU.

GCC is a multi-phase compiler, where the phases roughly consist of:

Creating "tokens" from the input text.
Arranging those tokens into abstract syntax tree structure.
Pruning the abstract syntax tree.
Creating register based instructions, assuming an infinite number of CPU registers.
Mapping the registers into the actual registers available.
Writing the binary information out, in the loader's expected format.

Optimizations can impact a number of locations, typically they become active in the above mentioned steps 3 through 5. There are many optimizations, including:

Constant folding – Evaluate constant subexpressions in advance.
Strength reduction – Replace slow operations with faster equivalents.
Null sequences – Delete useless operations.
Combine operations – Replace several operations with one equivalent.
Algebraic laws – Use algebraic laws to simplify or reorder instructions.
Special case instructions – Use instructions designed for special operand cases.
Address mode operations – Use address modes to simplify code.
Loop unrolling - Replace a loop with equivalent instructions
Partial loop unrolling - Reduce times a loop is evaluated while preserving overall function.

Note that these are not all the optimizations that might be performed, but it starts to give you an idea.

For example, if the compiler sees

int s = 3;
while (s < 6) {
   printf("%d\n", s);
   s++;
}

and the flags are set to unroll loops, then it might write CPU instructions equivalent to

printf("%d\n", 3);
printf("%d\n", 4);
printf("%d\n", 5);

Those instructions might seem more wordy to us humans, but the CPU commands might be smaller, because there is no need to "lookup" the now-erased value of s, nor is there the need to add one to it, or store the new updated value back into RAM.

GCC arranges the optimizations into categories, ranging from "safe" to "risky". -O2 is a good compromise between speed and safety. Higher -O numbers are riskier.