Search code examples
cgccclangundefined-behavior

Clang 13 -O2 produces weird output while gcc does not


Can someone explain to me why the following code gets optimized strangely with clang 13 with the -O2 flag? Using lower optimizations settings with clang and with all optimization settings of gcc I get the expected printed output of "John: 5", however, with clang -O2 or greater optimization flags I get an output of ": 5." Does my code have undefined behavior that I am not aware of? Strangely enough, if I compile the code with -fsanitize=undefined, the code will work as expected. How should I even go about trying to diagnose an issue like this? Any help is greatly appreciated.

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

typedef size_t usize;

typedef struct String {
    char *s;
    usize len;
} String;

String string_new(void) {
    String string;
    char *temp = malloc(1);
    if (temp == NULL) {
        printf("Failed to allocate memory in \"string_new()\".\n");
        exit(-1);
    }
    string.s = temp;
    string.s[0] = 0;
    string.len = 1;
    return string;
}

String string_from(char *s) {
    String string = string_new();
    string.s = s;
    string.len = strlen(s);
    return string;
}

void string_push_char(String *self, char c) {
    self->len = self->len + 1;
    char *temp = realloc(self->s, self->len);
    if (temp == NULL) {
        printf("Failed to allocate memory in \"string_push_char()\".\n");
        exit(-1);
    }
    self->s[self->len - 2] = c;
    self->s[self->len - 1] = 0;
}

void string_free(String *self) {
    free(self->s);
}

int main(void) {
    String name = string_new();
    string_push_char(&name, 'J');
    string_push_char(&name, 'o');
    string_push_char(&name, 'h');
    string_push_char(&name, 'n');

    printf("%s: %lu\n", name.s, name.len);

    string_free(&name);

    return 0;
}

Solution

  • Your string_push_char calls realloc but then continues to use the old pointer. This will usually go well if reallocation happens in place, but of course it's undefined behavior if the memory block gets moved.

    However, Clang has a (controversial) optimization where it assumes that the pointer passed to realloc always becomes invalid, because you're supposed to use the returned pointer instead.

    The solution is to assign temp back to self->s after the null check.

    As a side note, your string_from is so completely broken that you should remove it and rethink it from scratch.