Search code examples
cdynamicprintfpointer-to-pointer

The function only works with a useless printf


I usually try hard and harder to solve myself any bugs I find in my code, but this one is totally out of any logic for me. It works really fine with whatever strings and char separators, but only with that useless printf inside the while of the function, otherwise it prints

-> Lorem

then

-> ▼

and crashes aftwerwards. Thanks in advance to anyone that could tell me what is happening.

#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include <stdint.h>

char **strsep_(char *str, char ch) {
    // Sub-string length
    uint8_t len = 0;
    // The number of sub-strings found means the same as the position where it will be stored in the main pointer
    // Obviously, the number tends to increase over time, and at the end of the algorithm, it means the main pointer length too
    uint8_t pos = 0;
    // Storage for any found sub-strings and one more byte as the pointer is null-terminated
    char **arr = (char**)malloc(sizeof(char **) + 1);
    while (*str) {
        printf("Erase me and it will not work! :)\n");
        if (*str == ch) {
            // The allocated memory should be one step ahead of the current usage
            arr = realloc(arr, sizeof(char **) * pos + 1);
            // Allocates enough memory in the current main pointer position and the '\0' byte
            arr[pos] = malloc(sizeof(char *) * len + 1);
            // Copies the sub-string size (based in the length number) into the previously allocated space
            memcpy(arr[pos], (str - len), len);
            // `-_("")_-k
            arr[pos][len] = '\0';
            len = 0;
            pos++;
        } else {
            len++;
        }
        *str++;
    }
    // Is not needed to reallocate additional memory if no separator character was found
    if (pos > 0) arr = realloc(arr, sizeof(char **) * pos + 1);
    // The last chunk of characters after the last separator character is properly allocated
    arr[pos] = malloc(sizeof(char *) * len + 1);
    memcpy(arr[pos], (str - len), len);
    // To prevent undefined behavior while iterating over the pointer
    arr[++pos] = NULL;

    return arr;
}

void strsep_free_(char **arr) {
    char **aux = arr;
    while (*arr) {
        free(*arr);
        *arr = NULL;
        arr++;
    }
    // One more time to fully deallocate the null-terminated pointer
    free(*arr);
    *arr = NULL;
    arr++;
    // Clearing The pointer itself 
    free(aux);
    aux = NULL;
}

int main(void) {
    char **s = strsep_("Lorem ipsum four words", ' ');
    char **i = s;
    while (*i != NULL) {
        printf("-> %s\n", *i);
        i++;
    }
    strsep_free_(s);
}

Solution

  • Your program has undefined behavior, which means it may behave in unexpected ways, but could by chance behave as expected. Adding the extra printf changes the behavior in a way the seems to correct the bug, but only by coincidence. On a different machine, or even on the same machine at a different time, the behavior may again change.

    There are multiple bugs in your program that lead to undefined behavior:

    • You are not allocating the array with the proper size: it should have space fpr pos + 1 pointers, hence sizeof(char **) * (pos + 1). The faulty statements are: char **arr = (char**)malloc(sizeof(char **) + 1); and arr = realloc(arr, sizeof(char **) * pos + 1);.

    • Furthermore, the space allocated for each substring is incorrect too: arr[pos] = malloc(sizeof(char *) * len + 1); should read arr[pos] = malloc(sizeof(char) * len + 1);, which by definition is arr[pos] = malloc(len + 1);. This does not lead to undefined behavior, you just allocate too much memory. If your system supports it, allocation and copy can be combined in one call to strndup(str - len, len).

    • You never check for memory allocation failure, causing undefined behavior in case of memory allocation failure.

    • Using uint8_t for len and pos is risky: what if the number of substrings exceeds 255? pos and len would silently wrap back to 0, producing unexpected results and memory leaks. There is no advantage at using such a small type, use int or size_t instead.

    Here is a corrected version:

    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    
    char **strsep_(const char *str, char ch) {
        // Sub-string length
        int len = 0;
        // The number of sub-strings found, index where to store the NULL at the end of the array.
        int pos = 0;
        // return value: array of pointers to substrings with an extra slot for a NULL terminator.
        char **arr = (char**)malloc(sizeof(*arr) * (pos + 1));
        if (arr == NULL)
            return NULL;
        for (;;) {
            if (*str == ch || *str == '\0') {
                // alocate the substring and reallocate the array
                char *p = malloc(len + 1);
                char **new_arr = realloc(arr, sizeof(*arr) * (pos + 2));
                if (new_arr == NULL || p == NULL) {
                    // allocation failure: free the memory allocated so far
                    free(p);
                    if (new_arr)
                        arr = new_arr;
                    while (pos-- > 0)
                        free(arr[pos]);
                    free(arr);
                    return NULL;
                }
                arr = new_arr;
                memcpy(p, str - len, len);
                p[len] = '\0';
                arr[pos] = p;
                pos++;
                len = 0;
                if (*str == '\0')
                    break;
            } else {
                len++;
            }
            str++;
        }
        arr[pos] = NULL;
        return arr;
    }
    
    void strsep_free_(char **arr) {
        int i;
        // Free the array elements 
        for (i = 0; arr[i] != NULL; i++) {
            free(arr[i]);
            arr[i] = NULL;  // extra safety, not really needed
        }
        // Free The array itself 
        free(arr);
    }
    
    int main(void) {
        char **s = strsep_("Lorem ipsum four words", ' ');
        int i;
        for (i = 0; s[i] != NULL; i++) {
            printf("-> %s\n", s[i]);
        }
        strsep_free_(s);
        return 0;
    }
    

    Output:

    -> Lorem
    -> ipsum
    -> four
    -> words