Search code examples
cundefined-behaviorstrtok

undefined behaviour: strtok


The function tokenize below is intended to set *size to 0 if sprt doesnt exist within str - as such if sprt points to "|" and str to "D AO D", chunk[1] is supposed to point to a NULL pointer and n to be set to 0:

void
tokenize(char *str,
         const char *sprt /*separator*/,
         char **buffer,
         int *size /*tokens length*/)
{
  char *chunk[2] = {NULL, NULL};

  //store str value into chunk[0]
  chunk[0] = calloc(strlen(str)+1, sizeof(char));
  strcpy(chunk[0], str);

  if (buffer!=NULL)
  {
    int sz = 0;
    chunk[1] = strtok(str, sprt);
    while (chunk[1]!=NULL)
    {
      buffer[sz] = calloc(strlen(chunk[1])+1, sizeof(char));
      strcpy(buffer[sz], chunk[1]);
      chunk[1] = strtok(NULL, sprt);
      sz++;
    }
  }
  else
  {
    *size=0;

    //if chunk is not NULL, the iteration begins => size > 0
    chunk[1] = strtok(str, sprt);

    while (chunk[1]!=NULL)
    {
      (*size)++;
      chunk[1] = strtok(NULL, sprt);
    }

    printf("size=%i\n", *size);
  }

  //restore str value from chunk[0]
  strcpy(str, chunk[0]);

  if (chunk[0]!=NULL) free(chunk[0]);
  if (chunk[1]!=NULL) free(chunk[1]);
}

However when testing the function within the following code, bug: n really needs to be 0! gets displayed, which means that strtok didn't work as I expected:

int main()
{
  char *test = calloc(7, sizeof(char));
  strcpy(test, "D AO D");

  int n;
  tokenize(test, "|", NULL, &n);
  if (n>0)
    printf("bug: n really needs to be 0!\n");
  else
    printf("no bug\n");
}

I don't really know what caused this UB. What I'm doing wrong?


Solution

  • The first strtok call returns a pointer to the original string "D AO D", since there is no "|" delimiter in this string:

    chunk[1] = strtok(str, sprt);
    

    Then the while loop condition passes, since chunk[1] is a non-NULL pointer:

    while (chunk[1]!=NULL)
    {
      (*size)++;
      chunk[1] = strtok(NULL, sprt);
    }
    

    and *size is incremented in the first iteration. The next strtok call returns NULL as the terminating '\0' byte is encountered, and the loop is terminated due to unmet condition. Thus, *size becomes equal to 1, and this is expected behaviour.