Search code examples
cwhile-loopundefined-behaviorpost-increment

Is `while(*p++ = *(p+1));` undefined behavior?


I have code that manipulate C-string with one-line while loop statement. It works perfectly when compiled with MSVC2015, but gives different outcome when compiled with TDM-GCC (gcc (tdm-1) 5.1.0).

Here is a minimal example that shows the problem. The code overwrites current char with next char, repeat over and over again until it sets the current char to \0.

#include <stdio.h>

int main()
{
    char buf[999] = "Foobar", *p = buf;
    while(*p++ = *(p+1));
    printf("buf = %s\n", buf);
    return 0;
}

When the code is compiled with MSVC2015, the output is buf = oobar as expected. With TDM-GCC, however, the output is buf = obar.

If I change the while statement to while(*p = *(p+1)) { ++p; }, both compiler will give the expected result buf = oobar. It seems that by putting the post-increment operator inside the expression, I have triggered undefined behavior somehow.

My question is, why the code behaves differently when compiled with different compiler? Is it wrong (or non-standard) to put increment operator inside a non-trivial while statement? Did I trigger undefined behavior? If so, how should the code behave according to the C standard? If not, who is to blame here? TDM-GCC? MSVC?

UPDATE: For those in future who have the same doubt as me, the answer is: Yes, the code invokes UB. The well-defined way is to do like this: while(*p = *(p+1)){++p;}


Someone asked why would we want to code like this. Here is a scenario where this idiom can be useful.

#include <stdio.h>
#include <Windows.h>

static void EscapeDquote(char * const sz)
{
    char *p = sz;
    BOOL bs = FALSE;
    for (; *p; ++p)
    {
        if (*p == '\\') {
            bs = !bs;
            continue;
        }
        if (*p == '\"') {
            if (bs) {
                /*
                    discard prev char (backslash before dquote)
                    overwrite with next char until null-termi
                */
                char *q = --p;
                /* OLD version, not OK for GCC */
                /* while(*q++ = *(q+1)); */
                /* Safer version, works in GCC as well: */
                while(*q = *(q+1)){++q;}
            }
        }
        bs = FALSE;
    }
}

int main()
{
    /* "call \"D:\foo bar.exe\" */
    char szTest[] = "call \\\"D:\\foo bar.exe\\\"";
    printf("Before = %s\n", szTest);
    EscapeDquote(szTest);
    printf("After  = %s\n", szTest);
    return 0;
}

Solution

  • Yes, It is undefined behaviour, because Clang compiler gives following error:

    source_file.cpp:6:13: warning: unsequenced modification and access to 'p' [-Wunsequenced]
        while(*p++ = *(p+1));
                ^      ~
    

    C11: 6.5 Expressions:

    If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined. If there are multiple allowable orderings of the subexpressions of an expression, the behavior is undefined if such an unsequenced side effect occurs in any of the orderings