Search code examples
cstrtokstring-parsingpointer-arithmetic

Removing substring from string before calling strtok in C


I have a variable that I would like to split into an array of paths:

PATH=/bin:/usr/bin:/usr/local/bin

Where the result of the above string would be the following:

[0] -> /bin
[1] -> /usr/bin
[2] -> /usr/local/bin

If I just call strtok on the string with the delimiter : I get the results:

[0] -> PATH=/bin
[1] -> /usr/bin
[2] -> /usr/local/bin

But then I still have PATH= as a substring at the first index of the array. I need to find a way to remove PATH= from the string before I call strtok.

Rather than having to reallocate a new array of char without the PATH= substring, I thought I could increment the char pointer to point to the first character after PATH=.

char prefix[] = "PATH=";

if (strstr(str, prefix)) {
    str += strlen(prefix);
}

So now the pointer str points to the first / after PATH=.

Before str += strlen(prefix):

PATH=/bin:/usr/bin:/usr/local/bin
↑

After str += strlen(prefix):

PATH=/bin:/usr/bin:/usr/local/bin
     ↑

And I get the following array of paths from strtok.

[0] -> /bin
[1] -> /usr/bin
[2] -> /usr/local/bin

Would this be considered bad practice in C? Are there any side effects from doing so? Should I take another approach: I.e. allocate a new buffer and copy the value of str into the new buffer without PATH=.


Solution

  • Call strtok() with "=" as the delimiter the first time, then do the calls with ":" (and NULL) thereafter. You can change the delimiters on each call to strtok() if you have a need to do so.

    Of course, this assumes you don't mind strtok() butchering your string in the first place. Make sure you're working on a copy of your PATH variable.

    What you propose also works fine. The only gotcha I see is if str is the only pointer to the start of dynamically allocated space, you can no long free the space (memory leak). You need to decide whether that's a problem — and if it is, the solution is easy: keep a copy of the pointer that needs to be freed so you can free it.

    Also note that PATH is really weird:

    PATH=:/usr/bin::/bin:
    

    has three 'implicit' . elements in it: before the first :, between the middle two, and after the last one. Clearly, you would not normally have all three at once, but you need to know the rules of the game.