Search code examples
cgccundefined-behavior

"*s = 0" being optimized out. Possible GCC 13 bug? Or some undefined behaviour?


With GCC 13.2, the output of the following code depends on the optimization level:

#include <ctype.h>
#include <stdio.h>

char *SkipAName(char *s) {
  if (('A' <= *s && *s <= 'Z') || ('a' <= *s && *s <= 'z') || *s == '_' ||
      *s == '$') {
    if (*s == '$') {
      s++;
    }
    while (isalnum(*s)) {
      s++;
    }
    if (*s == '_') {
      s++;
    }
  }
  return s;
}

int TestName(char *name) {
  while (*name) {
    name++;
  }
  return 0;
}

int StrICmp(char *s1, char *s2) {
  while (*s1 && tolower(*s1) == tolower(*s2)) {
    s1++;
    s2++;
  }
  return tolower(*s1) - tolower(*s2);
}

int DoTable(char *s) {
  char *name, c;

  do {
    name = s;
    s = SkipAName(s);
    c = *s;
    *s = 0;
    TestName(name);
    *s = c;
    if (*s == '(') {
      break;
    }
    if (*s != ',') {
      printf("Error 1\n");
      return 1;
    }
    *s = 0;

    if (StrICmp(name, "sparse") == 0) {
    } else {
      printf("Error 2\n");
      return 1;
    }
    *s++ = ',';
    while (*s == ',') {
      s++;
    }
  } while (*s);

  printf("OK\n");
  return 0;
}

int main() {
  char buf[] = "sparse,C(1)";
  DoTable(buf);
  return 0;
}
$ gcc-13 -O0 test.c && ./a.out
OK
$ gcc-13 -O1 test.c && ./a.out
OK
$ gcc-13 -O2 test.c && ./a.out
Error 2
$ gcc-13 -O3 test.c && ./a.out
Error 2

The code comes from this project and I have tried to make a minimal reproducible example; that is why this code may look awkward. See also this issue.

I am wondering if this is a bug in GCC 13 or if I have hit some undefined behaviour. With Compiler Explorer, it looks like somehow *s = 0 at line 52 is optimized away. The code works fine with GCC 12.3.


Solution

  • I am wondering if this is a bug in GCC 13 or if I have hit some undefined behaviour.

    • The arguments to isalnum() and tolower() should be chars that have been converted to type unsigned char (before automatically being converted to int to match the parameter type). Example: isalnum((unsigned char) *s). That issue could cause a similar program to manifest UB, but the input provided by the particular program in your example will not have that effect.

    • The code assumes that the uppercase Latin letters are encoded as a contiguous range of numbers in the execution character set, and the same for the lowercase Latin letters. C does not guarantee that this will be so, and if it isn't, then the program will not work as intended. However, that issue is not at all likely to be a problem for you in your test environment.

    • SkipAName() also seems to have weird rules for what it considers a name to skip, but that's not erroneous.

    • TestName() has no caller-observable effect. But that's not erroneous, and if the observed misbehavior depends on that function and the call to it being present in the program then that's very curious.

    • StrICmp() can run off the end of the second string, thus producing UB, but that will not happen with your particular input.

    • It's a bit nasty that DoTable() modifies its input, and especially that it does not consistently reverse its changes, since they seem intended only to serve its internal purposes. Among other things, that makes it unsafe to use with string literals. But your example input is not a string literal, and it is not inherently erroneous that DoTable() modifies it.

    • I do not like DoTable()'s use of parameter s. I am not altogether against functions modifying their parameters, but I am very much for using descriptive names. It's altogether unclear what the name "s" means that is both adequately descriptive and consistent with all the uses to which it is put. But of course, that doesn't make the code erroneous.

    Overall, this code has some issues, and I dislike it overall, but I see no undefined behavior in it as presented. The outputs you show for GCC 13 builds at higher optimization levels are wrong. That is, they reflect a bug in GCC 13.