Search code examples
ccountcpu-wordwc

words counting in file like linux wc command in C


I am trying to write something that works like the Linux command wc to count words, new lines and bytes in any kind of files and i can only use the C function read. I have written this code and i am getting the correct values for newlines and bytes but i am not getting the correct value for counted words.

int bytes = 0;
int words = 0;
int newLine = 0;
char buffer[1];
int file = open(myfile,O_RDONLY);
if(file == -1){
  printf("can not find :%s\n",myfile);
}
else{
  char last = 'c'; 
  while(read(file,buffer,1)==1){
    bytes++;
    if(buffer[0]==' ' && last!=' ' && last!='\n'){
      words++;
    }
    else if(buffer[0]=='\n'){
      newLine++;
      if(last!=' ' && last!='\n'){
        words++;
      }
    }
    last = buffer[0];
  }        
  printf("%d %d %d %s\n",newLine,words,bytes,myfile);        
} 

Solution

  • You should reverse your logic. Rather than look for a space, and increment your word count, look for a non-space to increment the word count. Also, it can help to use a state variable versus looking at the last char:

    int main(void)
    {
       const char *myfile = "test.txt";
       int bytes = 0;
       int words = 0;
       int newLine = 0;
       char buffer[1];
       int file = open(myfile,O_RDONLY);
       enum states { WHITESPACE, WORD };
       int state = WHITESPACE;
       if(file == -1){
          printf("can not find :%s\n",myfile);
       }
       else{
          char last = ' '; 
          while (read(file,buffer,1) ==1 )
          {
             bytes++;
             if ( buffer[0]== ' ' || buffer[0] == '\t'  )
             {
                state = WHITESPACE;
             }
             else if (buffer[0]=='\n')
             {
                newLine++;
                state = WHITESPACE;
             }
             else 
             {
                if ( state == WHITESPACE )
                {
                   words++;
                }
                state = WORD;
             }
             last = buffer[0];
          }        
          printf("%d %d %d %s\n",newLine,words,bytes,myfile);        
       } 
    
    }
    

    It appears that wc has some logic with respect to punctuation characters not being words, that this code does not handle.