Search code examples
cstrtok

Taking of the last word in the line with strtok


Given a file with the following line:

word1 word2 word3 word4

I tried to write the following code:

FILE* f = fopen("my_file.txt", "r");
char line[MAX_BUFFER + 1];
if (fgets(line, MAX_LENGTH_LINE, f) == NULL) {
    return NULL;
}
char* word = strtok(line, " ");
for (int i = 0; i < 4; i++) {
    printf("%s ", word);
    word = strtok(NULL, " ");
}

For prints the "words".

It's working. But, I don't understand something.

How it's acheive the last word word4? (I don't understand it because that after "word4" not exists a space)..


Solution

  • I'm not quite sure what you're asking. Are you asking how the program was able to correctly read word4 from the file even though it wasn't followed by a space? Or are you asking why, when the program printed word4 back out, it didn't seem to print a space after it?

    The answer to the first question is that strtok is designed to give you tokens separated by delimiters, not terminated by delimiters. There is no requirement that the last token be followed by a delimiter.

    To see the answer to the second question, it may be more clear if we adjust the program and its printout slightly:

    char* word = strtok(line, " ");
    for (int i = 0; word != NULL; i++) {
        printf("%d: \"%s\"\n", i, word);
        word = strtok(NULL, " ");
    }
    

    I have made two changes here:

    1. The loop runs until word is NULL, that is, as long as strtok finds another word on the line. (This is to make sure we see all the words, and to make sure we're not trying to treat the fourth word specially in any way. If you were trying to treat the fourth word specially in some way, please say so.)
    2. The words are printed back out surrounded by quotes, so that we can see exactly what they contain.

    When I run the modified program, I see:

    0: "word1"
    1: "word2"
    2: "word3"
    3: "word4
    "
    

    That last line looks very strange at first, but the explanation is straightforward. You originally read the line using fgets, which does copy the terminating \n character into the line buffer. So it ends up staying tacked onto word4; that is, the fourth "word" is "word4\n".

    For this reason, it's often a good idea to include \n in the set of whitespace delimiter characters you hand to strtok -- that is, you can call strtok(line, " \n") instead. If I do that (in both of the strtok calls), the output changes to

    0: "word1"
    1: "word2"
    2: "word3"
    3: "word4"
    

    which may be closer to what you expected.