Search code examples
clinuxstringalgorithmwc

Why is my wc implementation giving wrong word count?


Here is a small code snippet.

 while((c = fgetc(fp)) != -1)
    {
        cCount++; // character count
        if(c == '\n') lCount++; // line count
        else 
        {
            if(c == ' ' && prevC != ' ') wCount++; // word count
        }
        prevC = c; // previous character equals current character. Think of it as memory.
    }

Now when I run wc with the file containing this above snippet code(as is), I am getting 48 words, but when I use my program on same input data, I am getting 59 words.

How to calculate word count exactly like wc does?


Solution

  • You are treating anything that isn't a space as a valid word. This means that a newline followed by a space is a word, and since your input (which is your code snippet) is indented you get a bunch of extra words.

    You should use isspace to check for whitespace instead of comparing the character to ' ':

    while((c = fgetc(fp)) != EOF)
    {
        cCount++;
        if (c == '\n')
            lCount++;
        if (isspace(c) && !isspace(prevC))
            wCount++;
        prevC = c;
    }