Search code examples
cfilecharacterline

Counting chars, words and lines in a file


I try to count the number of characters, words, lines in a file. The txt file is:

The snail moves like a
Hovercraft, held up by a
Rubber cushion of itself,
Sharing its secret

And here is the code,

void count_elements(FILE* fileptr, char* filename, struct fileProps* properties) // counts chars, words and lines 
{
    fileptr = fopen(filename, "rb"); 
    int chars = 0, words = 0, lines = 0; 
    char ch;
    while ((ch = fgetc(fileptr)) != EOF  )
    {
        if(ch != ' ') chars++;
        if (ch == '\n') // check lines 
            lines++;
        if (ch == ' ' || ch == '\t' || ch == '\n' || ch == '\0') // check words
            words++;
      
    
    }
    fclose(fileptr); 
    properties->char_count = chars;
    properties->line_count = lines; 
    properties->word_count = words;

}

But when i print the num of chars, words and lines, outputs are 81, 18, 5 respectively What am i missing? (read mode does not changes anything, i tried "r" as well)


Solution

  • The solution I whipped up gives me the same results as the gedit document statistics:

    #include <stdio.h>
    
    void count_elements(char* filename)
    {
        // This can be a local variable as its not used externally. You do not have to put it into the functions signature.
        FILE *fileptr = fopen(filename, "rb"); 
        int chars = 0, words = 0, lines = 0; 
        int read;
        unsigned char last_char = ' '; // Save the last char to see if really a new word was there or multiple spaces
        while ((read = fgetc(fileptr)) != EOF) // Read is an int as fgetc returns an int, which is a unsigned char that got casted to int by the function (see manpage for fgetc)
        {
            unsigned char ch = (char)read; // This cast is safe, as it was already checked for EOF, so its an unsigned char.
    
            if (ch >= 33 && ch <= 126) // only do printable chars without spaces
            {
                ++chars;
            }
            else if (ch == '\n' || ch == '\t' || ch == '\0' || ch == ' ')
            {
                // Only if the last character was printable we count it as new word
                if (last_char >= 33 && last_char <= 126)
                {
                    ++words;
                }
                if (ch == '\n')
                {
                    ++lines;
                }
            }
            last_char = ch;     
        }
        fclose(fileptr); 
        
        printf("Chars: %d\n", chars);
        printf("Lines: %d\n", lines);
        printf("Words: %d\n", words);
    
    }
    
    int main()
    {
        count_elements("test");
    }
    

    Please see the comments in the code for remarks and explanations. The code also would filter out any other special control sequences, like windows CRLF and account only the LF