Search code examples
c++loopsvariable-declaration

Declaring variables inside loops, good practice or bad practice?


Question #1: Is declaring a variable inside a loop a good practice or bad practice?

I've read the other threads about whether or not there is a performance issue (most said no), and that you should always declare variables as close to where they are going to be used. What I'm wondering is whether or not this should be avoided or if it's actually preferred.

Example:

for(int counter = 0; counter <= 10; counter++)
{
   string someString = "testing";

   cout << someString;
}

Question #2: Do most compilers realize that the variable has already been declared and just skip that portion, or does it actually create a spot for it in memory each time?


Solution

  • This is excellent practice.

    By creating variables inside loops, you ensure their scope is restricted to inside the loop. It cannot be referenced nor called outside of the loop.

    This way:

    • If the name of the variable is a bit "generic" (like "i"), there is no risk to mix it with another variable of same name somewhere later in your code (can also be mitigated using the -Wshadow warning instruction on GCC)

    • The compiler knows that the variable scope is limited to inside the loop, and therefore will issue a proper error message if the variable is by mistake referenced elsewhere.

    • Last but not least, some dedicated optimization can be performed more efficiently by the compiler (most importantly register allocation), since it knows that the variable cannot be used outside of the loop. For example, no need to store the result for later re-use.

    In short, you are right to do it.

    Note however that the variable is not supposed to retain its value between each loop. In such case, you may need to initialize it every time. You can also create a larger block, encompassing the loop, whose sole purpose is to declare variables which must retain their value from one loop to another. This typically includes the loop counter itself.

    {
        int i, retainValue;
        for (i=0; i<N; i++)
        {
           int tmpValue;
           /* tmpValue is uninitialized */
           /* retainValue still has its previous value from previous loop */
    
           /* Do some stuff here */
        }
        /* Here, retainValue is still valid; tmpValue no longer */
    }
    

    For question #2: The variable is allocated once, when the function is called. In fact, from an allocation perspective, it is (nearly) the same as declaring the variable at the beginning of the function. The only difference is the scope: the variable cannot be used outside of the loop. It may even be possible that the variable is not allocated, just re-using some free slot (from other variable whose scope has ended).

    With restricted and more precise scope come more accurate optimizations. But more importantly, it makes your code safer, with less states (i.e. variables) to worry about when reading other parts of the code.

    This is true even outside of an if(){...} block. Typically, instead of :

        int result;
        (...)
        result = f1();
        if (result) then { (...) }
        (...)
        result = f2();
        if (result) then { (...) }
    

    it's safer to write :

        (...)
        {
            int const result = f1();
            if (result) then { (...) }
        }
        (...)
        {
            int const result = f2();
            if (result) then { (...) }
        }
    

    The difference may seem minor, especially on such a small example. But on a larger code base, it will help : now there is no risk to transport some result value from f1() to f2() block. Each result is strictly limited to its own scope, making its role more accurate. From a reviewer perspective, it's much nicer, since he has less long range state variables to worry about and track.

    Even the compiler will help better : assuming that, in the future, after some erroneous change of code, result is not properly initialized with f2(). The second version will simply refuse to work, stating a clear error message at compile time (way better than run time). The first version will not spot anything, the result of f1() will simply be tested a second time, being confused for the result of f2().

    Complementary information

    The open-source tool CppCheck (a static analysis tool for C/C++ code) provides some excellent hints regarding optimal scope of variables.

    In response to comment on allocation: The above rule is true in C, but might not be for some C++ classes.

    For standard types and structures, the size of variable is known at compilation time. There is no such thing as "construction" in C, so the space for the variable will simply be allocated into the stack (without any initialization), when the function is called. That's why there is a "zero" cost when declaring the variable inside a loop.

    However, for C++ classes, there is this constructor thing which I know much less about. I guess allocation is probably not going to be the issue, since the compiler shall be clever enough to reuse the same space, but the initialization is likely to take place at each loop iteration.