Search code examples
cstringstrtokansi-cnull-terminated

skip strtok's null terminators safely


I want to use strtok and then return the string after the null terminator that strtok has placed.

char *foo(char *bar)
{
strtok(bar, " ");
return after_strtok_null(bar);
}
/*
examples:
foo("hello world") = "world"
foo("remove only the first") = "only the first"
*/

my code is not for skipping the first word (as I know a simple while loop will do) but I do want to use strtok once and then return the part that was not tokenized.

I will provide details of what I am trying to do at the end of the question, although I don't think it's really necessary

one solution that came into my mind was to simply skip all the null terminators until I reach a non - null:

char *foo(char *bar)
{
bar = strtok(bar, " ");
while(!(*(bar++)));
return bar;
}

This works fine for the examples shown above, but when it comes to using it on single words - I may misidentify the string's null terminator to be strtok's null terminator, and then I may access non - allocated memory.

For example, if I will try foo("demo"\* '\0' *\) the of strtok will be "demo"\* '\0' *\ and then, if I would run the while loop I will accuse the part after the string demo. another solution I have tried is to use strlen, but this one have the exact same problem.

I am trying to create a function that gets a sentence. some of the sentences have have their first word terminated with colons, although not necessarily. The function need to take the first word if it is terminated with colons and insert it (without the colons) into some global table. Then return the sentence without the first colons - terminated word and without the spaces that follow the word if the word has colons - terminated word at the start and otherwise, just return the sentence without the spaces in the start of the sentence.


Solution

  • You could use str[c]spn instead:

    char *foo(char *bar) {
        size_t pos = strcspn(bar, " ");
        pos = strspn((bar += pos), "");
        // *bar = '\0';   // uncomment to mimic strtok
        return bar + pos;
    }
    

    You will get the expected substring of an empty string.

    A good point is that you can avoid changing the original string - even if mimicing strtok is trivial...