Search code examples
cpointersstrtok

using strtok to split one string more than once leads to an unexpected behavior


I faced a problem and I need someone to explain what's happening.

I am parsing a string like a=1&b=2&c=3 and I want to parse it to ["a=1","b=2","c=3"] and then split each x=y.

Here is the code:


#include <stdio.h>
#include <string.h>
#include <stdlib.h>

void f2(char *src)
{
    char *dest = (char *) calloc(strlen(src)+1,sizeof(char));    
   
    strcpy(dest,src); // I copy src to dest to guard src from being messed up with strtok

   // when I comment out the below line, src address doesn't change
   // but why is it changing the src address? I have copied the src to dest!
    char *token = strtok(dest, "="); 
    printf("dest addr: %p token addr: %p \n",dest,token);
}
void f1(char *src)
{
    char *token = strtok(src, "&");
    while (token)
    {
        printf("src addr: %p ", token);
        f2(token);
        token = strtok(NULL, "&");
    }
} 

And the I run the code like :

TEST(CopyPointer, CopyStrTok)
{
    char str[]="a=1&b=2&c=3";
    f1(str);
}

Here is the result:

src addr: 0x7ffd4a00ec0c dest addr: 0x558a755d3350 token addr: 0x558a755d3350 // it's fine 
src addr: 0x558a755d3352 dest addr: 0x558a755d3370 token addr: 0x558a755d3370 
//               ^                         ^    
// now src addr is changed and it's pointing to the second character of dest

I can't explain why the src is manipulated by f2 while I have copied the src to another variable called dest?

Correction:

As mentioned in one of the answers, src address is not changed, just token address is changed!


Solution

  • The strtok function uses static internal data to keep track of where it is.

    So when you call strtok in f1 it is associated with src in that function (which is the same as str in your test function), but when you call it again in f2 with dest as the first argument, it is now associated with dest in f2. Then when you call strtok again in f1 with NULL as the first argument, it's using an internal pointer to a member of dest which is no longer in scope. This triggers undefined behavior.

    If you want to use multiple levels of strtok, you should instead use strtok_r which allows the user to pass in an addition parameter to store its state.

    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    
    void f2(char *src)
    {
        char *p = NULL;
        char *token = strtok_r(src, "=", &p);
        printf("token a=%s\n", token);
        token = strtok_r(NULL, "=", &p);
        printf("token b=%s\n", token);
    }
    void f1(char *src)
    {
        char *p = NULL;
        char *token = strtok_r(src, "&", &p);
        while (token)
        {
            f2(token);
            token = strtok_r(NULL, "&", &p);
        }
    }
    
    int main()
    {
        char str[] = "a=1&b=2&c=3";
        f1(str);
        return 0;
    }
    

    Output:

    token a=a
    token b=1
    token a=b
    token b=2
    token a=c
    token b=3