Search code examples
cincrementgetcharkernighan-and-ritchie

C programming - K&R example 1.5.2 - modified program not functioning as intended


My question is simply "Why does my code on line 10 and 11 not function properly?" The intended purpose of my code is to do exactly as the original K&R code intended, but to NOT count nc whenever (getchar() == '\n') will you please enlighten me?

slightly modified K&R code:

/** K&R - 1.5.2 Character Counting **/
#include <stdio.h>

/* count characters in input; 1st version */
main(){
  long nc;

  nc = 0;
  while (getchar() != EOF){
    if (getchar() != '\n'){
      ++nc;
    }
  }
  printf("%ld\n", nc);
}

I use 64-bit Windows 7, CodeBlocks10.05, GNU GCC Compiler.

my current progress and understanding:

on a sample run, i type in the word two and hit enter, which equals 4 inputs, after which i press ctrl+Z to enter in a ^Z or EOF character. The program then prints 1. I was expecting it to print 3. I suppose the only logical explanation is that it is doing exactly the opposite of what I intended (it only counts newline characters?). As it turns out, if I type in the word two and press enter, lets say 4 times, it prints 4. It seems to be counting nc for every newline character entered, but yet if I press enter alone (in this case 4 times) and then EOF, it always prints 0. Upon further experimentation, by some hand unseen 4 is perhaps a magical number for this program. If I start it up and hit enter key exactly (a number divisible by 4) times and then EOF it prints 0. However if i hit enter some other number of times the EOF does nothing, and I must enter in ^Z two rows, one after the other, to end the while loop correctly, and it prints 1. This is boggling my mind!


Solution

  • The trouble is that you need to save the value from getchar() – in an int – because you are reading two characters for each time you increment the count. One of those is in the EOF test; the second is in the newline test.

    int c;
    
    while ((c = getchar()) != EOF)
    {
        if (c != '\n')
            ++nc;
    }
    

    The reason you need to store the result of getchar() in an int and not a char is that it can return every possible char value and also a distinct value, EOF. If you don't use int (you store direct into a char), one of two things will happen:

    1. If char is a signed type, a legitimate character (often y-umlaut, ÿ, LATIN SMALL LETTER Y WITH DIAERESIS, U+00FF — at least in codesets derived from Latin 1 or ISO 8859-1) will be interpreted as equivalent to EOF, and your program will terminate prematurely.
    2. If char is an unsigned type, no character will ever be equivalent to EOF, so the program will never stop the loop.

    Neither of these circumstances is desirable. Storing the return value of getchar() in an int prevents both problems; it is the 'only' (or, at least, the simplest) correct way to do it.