Search code examples
cpointerskernighan-and-ritchie

confusing getchar using read() under linux


  1. Why do we write *bufp = buf since both are arrays so in my opinion it should be:

    static char bufp = buf;
    
  2. How does *bufp "know" from which position to start displaying ? It is not initialized to zero in any way. After assigning buf to bufp I'd expect that in return line it starts with the last entered char.

  3. Is unsigned char modifier used here just to omit the case of -1 being the input - meaning EOF on most systems?

#include "syscalls.h"
/* getchar: simple buffered version */
int getchar(void)
{
    static char buf[BUFSIZ];
    static char *bufp = buf; /* [1] */
    static int n = 0;
    if (n == 0) {            /* buffer is empty */
        n = read(0, buf, sizeof buf);
        bufp = buf;          /* ? [1] here it is written like in my question so which is true ? */
    }
    return (--n >= 0) ? (unsigned char) *bufp++ : EOF; /* [2] & [3] */
}

Solution

  • [1] char bufp = buf is incorrect, as buf is an array of char (and is internally an address, i.e. the content of a pointer), and char bufp would declare a unique character. char *bufp, instead, is a pointer to a char (to the first char, but you can access the next ones also).

    [2] bufp points to the buf array, ie its first character, at the beginning. And n is set to 0. bufp, buf and n are all static, meaning they "live" after the function returns - each of their value is initialized when the program loads, then the initialization is not performed anymore each time the function is called. Thus they "remember" the status of the buffer:

    `n` is the number of characters in the buffer, ready to be returned one by one,
    
    `bufp` points to the next character to be returned (when n > 0),
    
    and `buf` the array just holds the characters in the buffer.
    

    So to answer your [2] question,

    • when there is no character available (n == 0) a call to read fills the buffer buf and bufp points to the beginning of that array.
    • then as long as the buffer characters have not all been returned one by one (n > 0), *bufp is the next character to be returned ; *bufp++ gives the character to be returned and increments the bufp pointer by one.

    [3] The unsigned modifier prevents the compiler to propagate the *bufp character (8 bits) sign to the int other bytes (usually 32 bits, ie the 24 most significant bits), since an int is returned. Thus any character where code would be > 127 (for unsigned chars, or negative for signed char) is returned as is (eg (unsigned char)200 is returned as (int)200).