I usually try hard and harder to solve myself any bugs I find in my code, but this one is totally out of any logic for me. It works really fine with whatever strings and char separators, but only with that useless printf
inside the while
of the function, otherwise it prints
-> Lorem
then
-> ▼
and crashes aftwerwards. Thanks in advance to anyone that could tell me what is happening.
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include <stdint.h>
char **strsep_(char *str, char ch) {
// Sub-string length
uint8_t len = 0;
// The number of sub-strings found means the same as the position where it will be stored in the main pointer
// Obviously, the number tends to increase over time, and at the end of the algorithm, it means the main pointer length too
uint8_t pos = 0;
// Storage for any found sub-strings and one more byte as the pointer is null-terminated
char **arr = (char**)malloc(sizeof(char **) + 1);
while (*str) {
printf("Erase me and it will not work! :)\n");
if (*str == ch) {
// The allocated memory should be one step ahead of the current usage
arr = realloc(arr, sizeof(char **) * pos + 1);
// Allocates enough memory in the current main pointer position and the '\0' byte
arr[pos] = malloc(sizeof(char *) * len + 1);
// Copies the sub-string size (based in the length number) into the previously allocated space
memcpy(arr[pos], (str - len), len);
// `-_("")_-k
arr[pos][len] = '\0';
len = 0;
pos++;
} else {
len++;
}
*str++;
}
// Is not needed to reallocate additional memory if no separator character was found
if (pos > 0) arr = realloc(arr, sizeof(char **) * pos + 1);
// The last chunk of characters after the last separator character is properly allocated
arr[pos] = malloc(sizeof(char *) * len + 1);
memcpy(arr[pos], (str - len), len);
// To prevent undefined behavior while iterating over the pointer
arr[++pos] = NULL;
return arr;
}
void strsep_free_(char **arr) {
char **aux = arr;
while (*arr) {
free(*arr);
*arr = NULL;
arr++;
}
// One more time to fully deallocate the null-terminated pointer
free(*arr);
*arr = NULL;
arr++;
// Clearing The pointer itself
free(aux);
aux = NULL;
}
int main(void) {
char **s = strsep_("Lorem ipsum four words", ' ');
char **i = s;
while (*i != NULL) {
printf("-> %s\n", *i);
i++;
}
strsep_free_(s);
}
Your program has undefined behavior, which means it may behave in unexpected ways, but could by chance behave as expected. Adding the extra printf
changes the behavior in a way the seems to correct the bug, but only by coincidence. On a different machine, or even on the same machine at a different time, the behavior may again change.
There are multiple bugs in your program that lead to undefined behavior:
You are not allocating the array with the proper size: it should have space fpr pos + 1
pointers, hence sizeof(char **) * (pos + 1)
. The faulty statements are: char **arr = (char**)malloc(sizeof(char **) + 1);
and arr = realloc(arr, sizeof(char **) * pos + 1);
.
Furthermore, the space allocated for each substring is incorrect too: arr[pos] = malloc(sizeof(char *) * len + 1);
should read arr[pos] = malloc(sizeof(char) * len + 1);
, which by definition is arr[pos] = malloc(len + 1);
. This does not lead to undefined behavior, you just allocate too much memory. If your system supports it, allocation and copy can be combined in one call to strndup(str - len, len)
.
You never check for memory allocation failure, causing undefined behavior in case of memory allocation failure.
Using uint8_t
for len
and pos
is risky: what if the number of substrings exceeds 255? pos
and len
would silently wrap back to 0
, producing unexpected results and memory leaks. There is no advantage at using such a small type, use int
or size_t
instead.
Here is a corrected version:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char **strsep_(const char *str, char ch) {
// Sub-string length
int len = 0;
// The number of sub-strings found, index where to store the NULL at the end of the array.
int pos = 0;
// return value: array of pointers to substrings with an extra slot for a NULL terminator.
char **arr = (char**)malloc(sizeof(*arr) * (pos + 1));
if (arr == NULL)
return NULL;
for (;;) {
if (*str == ch || *str == '\0') {
// alocate the substring and reallocate the array
char *p = malloc(len + 1);
char **new_arr = realloc(arr, sizeof(*arr) * (pos + 2));
if (new_arr == NULL || p == NULL) {
// allocation failure: free the memory allocated so far
free(p);
if (new_arr)
arr = new_arr;
while (pos-- > 0)
free(arr[pos]);
free(arr);
return NULL;
}
arr = new_arr;
memcpy(p, str - len, len);
p[len] = '\0';
arr[pos] = p;
pos++;
len = 0;
if (*str == '\0')
break;
} else {
len++;
}
str++;
}
arr[pos] = NULL;
return arr;
}
void strsep_free_(char **arr) {
int i;
// Free the array elements
for (i = 0; arr[i] != NULL; i++) {
free(arr[i]);
arr[i] = NULL; // extra safety, not really needed
}
// Free The array itself
free(arr);
}
int main(void) {
char **s = strsep_("Lorem ipsum four words", ' ');
int i;
for (i = 0; s[i] != NULL; i++) {
printf("-> %s\n", s[i]);
}
strsep_free_(s);
return 0;
}
Output:
-> Lorem
-> ipsum
-> four
-> words