I am a Java programmer testing my luck at C. I am trying to read a file in line by line and then count each individual word. So far I am not having luck separating each line into words. I am able to see each line and loop through the file correctly but my output is only the first word of each line. What am I doing wrong here?
char printword[1024]= "";
void print() {
printf("%s", printword);
}
main()
{
FILE* f;
errno_t err;
err = fopen_s(&f, FILE_NAME, "r");
if (&f == NULL) {
exit(EXIT_FAILURE);
}
char line[1024];
while (fgets(line, 1024, f) != NULL) {
char * word;
char *context = " ";
word = strtok(line, " ");
while (word != NULL) {
strcpy(printword, strcat(word," "));
print();
word = strtok(NULL, " ");
}
printf("\n", NULL);
}
//}
fclose(f);
printf("Press any key to continue");
getchar();
exit(0);
}
@BlueStrat appears to have put his finger on the issue with his comment.
When using strtok()
, you must always remember that it does not allocate any memory, but instead returns pointers into the original string (inserting terminators in place of delimiters), and maintains an internal static pointer to the start of the next token. Suppose, then, that the first line of your input file contains
one two three
fgets()
will read that into your line
array:
0 1
offset 0123456789012 3
line one two three\0
The first strtok()
call returns a pointer to the character at offset 0, sets the character at offset 3 to a terminator, and sets its internal state variable to point to the character at offset 4:
0 1
offset 012 3456789012 3
line one\0two three\0
^ ^
| |
| +-- (next)
+------- word
Then you strcat
an extra character onto the end of word
, producing:
0 1
offset 0123 456789012 3
line one \0wo three\0
^ ^
| |
| +-- (next)
+------- word
Now study that for a moment. Not only have you corrupted the data following the first token, you have done it in such a way that the internal state pointer points to a string terminator. When you next call strtok()
, then, that function sees that it is at the end of the string (a string), and returns NULL
to signal that there are no more tokens.
Instead of manipulating the token, which is perilous, concatenate its contents to the printword
buffer and then concatenate the extra space to that.