The following code is showing strange behaviour. While giving the input if I press a newline then only it prints the histogram value otherwise if I directly enter EOF(^Z), it shows all zeros. Is there a problem with getchar() function that it takes the input only when newline is pressed.
#include <stdio.h>
#define IN 1 /* inside a word */
#define OUT 0 /* outside a word */
#define MAXLEN 50
/* count lines, words, and characters in input */
main()
{
int c, i, j, nc, state;
int wordlength[MAXLEN];
state = OUT;
nc = 0;
for (i = 0; i < MAXLEN; i++)
wordlength[i] = 0;
while ((c = getchar()) != EOF) {
if (c == ' ' || c == '\n' || c == '\t') {
if (state == IN) {
wordlength[nc-1]++;
}
state = OUT;
}
else if (state == OUT) {
//putchar('\n');
state = IN;
nc = 0;
}
if (state == IN) {
++nc;
}
}
for (j = 0; j < MAXLEN; j++)
printf("\n%d - %d",j,wordlength[j]);
for (i = 10; i >= 0; i--) {
for (j = 0; j < MAXLEN; j++)
printf(((wordlength[j] > i)?"|":" "));
printf("\n");
}
}
Your code works more or less sanely for me unless I type a single word of input not followed by any white space (blank, tab, newline) before indicating EOF (Control-D on my machine; if you use Control-Z, it suggests you are running on Windows). If you indicate EOF without a final white space, the last word is not added to the histogram. You should, of course, also check that the word length is not too big so that you do not index outside the wordlength
array (if (nc > MAXSIZE) nc = MAXSIZE;
to count all the very long words as the same size).
After the main processing loop, you should check whether nc > 0
and if so, increment the appropriate entry in wordlength
.
Consider using isspace()
from <ctype.h>
, too.
I use enum
instead of #define
whenever possible so that the symbols are available in the debugger. You carefully avoided one common mistake; you made the variable c
into an int
, not a char
.
#include <stdio.h>
enum { IN = 1, OUT = 0 }; /* inside, outside a word */
enum { MAXLEN = 50 };
/* count lines, words, and characters in input */
int main(void)
{
int c, i, j, nc, state;
int wordlength[MAXLEN];
state = OUT;
nc = 0;
for (i = 0; i < MAXLEN; i++)
wordlength[i] = 0;
while ((c = getchar()) != EOF)
{
if (c == ' ' || c == '\n' || c == '\t')
{
if (state == IN)
{
if (nc > MAXLEN)
nc = MAXLEN; /* All long words grouped together */
wordlength[nc-1]++;
}
state = OUT;
}
else if (state == OUT)
{
state = IN;
nc = 0;
}
if (state == IN)
++nc;
}
if (nc > 0)
{
if (nc > MAXLEN)
nc = MAXLEN; /* All long words grouped together */
wordlength[nc-1]++;
}
for (j = 0; j < MAXLEN; j++)
printf("\n%d - %d", j, wordlength[j]);
for (i = 10; i >= 0; i--)
{
for (j = 0; j < MAXLEN; j++)
putchar( (wordlength[j] > i) ? '|' : ' ');
printf("\n");
}
return 0;
}
You said you were having problems with your machine. I'd be very cautious about claiming to find a bug in the system, especially in such an obvious call as getchar()
. I can't rule out the possibility, but that would be the last thing I'd think of blaming. I'd spend a lot of time working out what I've done wrong to break things before thinking there's a bug in getchar()
.
In the comments, you ask to be told why your program is not working in your environment. Since you've not (yet) formally identified the platform/environment where you are running your program, this is not possible.
However, I have demonstrated that your original as-posted program works reasonably sanely on a Unix-like environment (I'm testing on MacOS X 10.7.2, but it would work the same for any other similar Unix-like system). The revised version works slightly better; it will count the last word entered even if it is not followed by a space or newline.
If, as inferred, you are working on Windows, then the terminal I/O model may be different. In particular, the C standard requires that text files (perhaps including terminal input) must end with a newline before the EOF; any characters after the last newline may be discarded but that is platform dependent. The behaviour for binary files is different. If the data after the last newline, that would be consistent with the behaviour you are reporting. It may well be the expected behaviour - if you look at the documentation for your unidentified system. This is one of the areas of differences between implementations identified by P J Plauger in his excellent (but somewhat dated) 'The Standard C Library'.
However, if what I'm hypothesizing is correct, then I still wish to make it clear that your code is correct (enough); the trouble is simply that your expectations don't match the documented behaviour of your system. Note that reporting the platform on which you are working is sometimes crucial. It tends to be more crucial as you are encroaching on edge cases. And it still is extremely unlikely that you've hit on a bug in getchar()
.
Incidentally, when I was testing, I needed to type Control-D twice (and that was what I was expecting to have to do). The first time flushed the characters that I'd entered on the line (abc
) to the program as a 3-byte read; the second also flushed the characters that I'd entered (all zero of them) to the program as a 0-byte read which was then interpreted as EOF by getchar()
. I also tested with abc
(a blank at the end), and then the EOF. Your code did not count the abc
without a blank; it did count the abc
when it was followed by a blank.