I am trying to write something that works like the Linux command wc to count words, new lines and bytes in any kind of files and i can only use the C function read. I have written this code and i am getting the correct values for newlines and bytes but i am not getting the correct value for counted words.
int bytes = 0;
int words = 0;
int newLine = 0;
char buffer[1];
int file = open(myfile,O_RDONLY);
if(file == -1){
printf("can not find :%s\n",myfile);
}
else{
char last = 'c';
while(read(file,buffer,1)==1){
bytes++;
if(buffer[0]==' ' && last!=' ' && last!='\n'){
words++;
}
else if(buffer[0]=='\n'){
newLine++;
if(last!=' ' && last!='\n'){
words++;
}
}
last = buffer[0];
}
printf("%d %d %d %s\n",newLine,words,bytes,myfile);
}
You should reverse your logic. Rather than look for a space, and increment your word count, look for a non-space to increment the word count. Also, it can help to use a state variable versus looking at the last char:
int main(void)
{
const char *myfile = "test.txt";
int bytes = 0;
int words = 0;
int newLine = 0;
char buffer[1];
int file = open(myfile,O_RDONLY);
enum states { WHITESPACE, WORD };
int state = WHITESPACE;
if(file == -1){
printf("can not find :%s\n",myfile);
}
else{
char last = ' ';
while (read(file,buffer,1) ==1 )
{
bytes++;
if ( buffer[0]== ' ' || buffer[0] == '\t' )
{
state = WHITESPACE;
}
else if (buffer[0]=='\n')
{
newLine++;
state = WHITESPACE;
}
else
{
if ( state == WHITESPACE )
{
words++;
}
state = WORD;
}
last = buffer[0];
}
printf("%d %d %d %s\n",newLine,words,bytes,myfile);
}
}
It appears that wc has some logic with respect to punctuation characters not being words, that this code does not handle.