Search code examples
ccharbitwise-andinteger-promotionisspace

Why do a bitwise-and of a character with 0xff?


I am reading some code that implements a simple parser. A function named scan breaks up a line into tokens. scan has a static variable bp that is assigned the line to be tokenized. Following the assignment, the whitespace is skipped over. See below. What I don't understand is why the code does a bitwise-and of the character that bp points to with 0xff, i.e., what is the purpose of * bp & 0xff? How is this:

while (isspace(* bp & 0xff))
    ++ bp;

different from this:

while (isspace(* bp))
    ++ bp;

Here is the scan function:

static enum tokens scan (const char * buf)
                    /* return token = next input symbol */
{   static const char * bp;

    while (isspace(* bp & 0xff))
        ++ bp;

        ..
}

Solution

  • From the C Standard (7.4 Character handling <ctype.h>)

    1 The header <ctype.h> declares several functions useful for classifying and mapping characters.198) In all cases the argument is an int, the value of which shall be representable as an unsigned char or shall equal the value of the macro EOF. If the argument has any other value, the behavior is undefined.

    In this call

    isspace(* bp)
    

    the argument expression *bp having the type char is converted to the type int due to the integer promotions.

    If the type char behaves as the type signed char and the value of the expression *bp is negative then the value of the promoted expression of the type int is also will be negative and can not be representable as a value of the type unsigned char.

    This results in undefined behavior.

    In this call

    isspace(* bp & 0xff)
    

    due to the bitwise operator & the result value of the expression * bp & 0xff of the type int can be represented as a value of the type unsigned char.

    So it is a trick used instead of writing a more clear code like

    isspace( ( unsigned char )*bp )
    

    The function isspace is usually implemented such a way that it uses its argument of the type int as an index in a table with 256 values (from 0 to 255). If the argument of the type int has a value that is greater than the maximum value 255 or a negative value (and is not equal to the value of the macro EOF) then the behavior of the function is undefined.