c optimization branch pipeline microprocessors

Explanation for this specific case of pipeline stall

I'm a newbie to embedded systems. I'm trying to learn and improve as much as I can. I'm stuck at a point and it's probably OK to just ignore and move on. However, I can't help but let it tickle my mind. I feel like I have to understand this.

On my way, one of the best resources I was able to find is this YouTube playlist by Quantum Leaps, LLC: Embedded Systems Programming Course

In Lesson 2 after the time 6.12, Mr. Samek explains some stuff regarding pipeline stalls. At the first glance, I didn't understand a word of it. Then I did some research and readings about pipelines, bubbles, branches etc. to get familiar with concepts and to understand basic mechanisms behind them. I then watched the video again but I still don't understand how the second case is faster.

Edit to make the question self-contained as much as possible:

In the video, the code is written as follows:

    int main() {
        int counter = 0;
        while (counter < 20) {
            ++counter;
        }

        return 0;
    }

After the compilation code gets this structure:

    ...
    ++counter; // Let's call this line "Line X"
    while (counter < 20) {
        // Assembly instructions to go back to Line X
    }
    ...

If I'm not mistaken, Mr. Samek says second code is faster since it avoids a branch. But I don't understand how.

Any help (and advice to learn & improve) is appreciated, thanks in advance!

Solution

The code as written in C has two branches :

a conditional branch when testing the while loop condition
an unconditional branch to jump back to the beginning of the loop after the loop body

The generated assembly reorders the code, in order to eliminate the unconditional branch, and only need the conditional branch.

This elimination of a branch instruction results in the speedup that the video author mentions. Refer to the flow images shown in the video to see the difference between the two.

The part about pipeline stalls and branch prediction is unrelated to this point, and instead talks about another thing to be aware of : every conditional branch can potentially lead to pipeline stalls, so limiting the amount of conditional branches can be advantageous.