Search code examples
cscopestate-machinegoto

C block scoping


I am trying to understand the implications of block scoping in C.

I realise that identifiers defined within a scope are invisible outside the scope but what are the implications of block scoping at an instruction level? Does entry into or exit from a block scope imply any instructions or is it entirely transparent at an instruction value? Are variable defined inside a scope destroyed like they are within a loop constuct?

At an instruction level, after optimizing, is the following:

initialise:
    int a = 0;
block_entry:
    a += 1;
    /* on first pass (initialisation): a == 1 */
    /* on second pass (entry by goto): a==2 ? */
    if (a==2): goto done

goto block_entry
done:

any different from:

{
initialise:
    int a = 0;
block_entry:
    a += 1;
    /* on first pass (initialisation): a == 1 */
    /* on second pass (entry by goto): a==2 ? */
    if (a==2): goto done
}

goto block_entry
done:

or from:

while(1){
initialise:
    int a = 0;
block_entry:
    a += 1;
    /* on first pass (initialisation): a == 1 */
    /* on second pass (entry by goto): a == 2 ? */
    if (a==2): goto done
    goto main_code
}

main_code:
goto block_entry
done:

The question is largely academic and inspired by Eli Bendersky's post "Computed goto for efficient dispatch tables" where he seems to use a while(1) {...} loop purely for visual structuring. (In the interp_cgoto(...) function specifically.)

Would his code perform compile any different if he were to use a block scope for visual structuring or no scoping at all? (I.e. removing the while(1) {...} loop.)


Solution

  • The behaviour of snippets two and three is undefined because the lifetime of variable a ends when the block in which it is declared is exited (even if the exit is by means of a goto). When the block is re-entered, a new a is created, with an initially indeterminate value. Since the declaration statement is skipped by the goto, the value of a continues to be indeterminate. Subsequently attempting to use that value (a += 1;) results in undefined behaviour.

    Here's an example which actually seems to demonstrate the undefined behaviour in practice:

    #include <stdio.h>
    #include <stdlib.h>
    
    int main(int argc, char** argv) {
        {
    initialise:;
            int a[10] = {0};
    block_entry:
            a[0] += 1;
            printf("a is %d\n", a[0]);
            /* on first pass (initialisation): a == 1 */
            /* on second pass (entry by goto): a==2 ? */
            if (a[0]>=2) goto done;
        }
        {
            int x[10];
            x[0] = argc > 1 ? atoi(argv[1]) : 42;
            printf("x is %d\n", x[0]);
        }
    
        goto block_entry;
    done:
        puts("Done");
        return 0;
    }
    

    (Live on coliru)

    I fixed a couple of typos (where the pseudocode was a mix of C and Python),, and added another block where the stack might be reused. And I changed the termination condition to >=, for reasons which might be evident.

    Within the precise version of gcc, etc., this results in a[0] and x[0] sharing storage, so the second time through the loop a is 43 instead of 2.

    If you change the size of the arrays to something smaller, then gcc doesn't put them at the same stack location, and you get the behaviour of the original snippet, where a is 2 on the second pass.

    On the other hand, if you use -O3 instead of -O0, then gcc compiles an endless loop where a is always 1.

    All of these results are acceptable, because undefined behaviour puts no constraints on the compiler.

    In short, Don't Do That (sm).