Search code examples
copenbsd

C pointer to array declaration with bitwise and operator


I want to understand the following code:

//...
#define _C 0x20
extern const char *_ctype_;
//...
__only_inline int iscntrl(int _c)
{
    return (_c == -1 ? 0 : ((_ctype_ + 1)[(unsigned char)_c] & _C));
}

It originates from the file ctype.h from the obenbsd operating system source code. This function checks if a char is a control character or a printable letter inside the ascii range. This is my current chain of thought:

  1. iscntrl('a') is called and 'a' is converted to it's integer value
  2. first check if _c is -1 then return 0 else...
  3. increment the adress the undefined pointer points to by 1
  4. declare this adress as a pointer to an array of length (unsigned char)((int)'a')
  5. apply the bitwise and operator to _C (0x20) and the array (???)

Somehow, strangely, it works and everytime when 0 is returned the given char _c is not a printable character. Otherwise when it's printable the function just returns an integer value that's not of any special interest. My problem of understanding is in step 3, 4 (a bit) and 5.

Thank you for any help.


Solution

  • _ctype_ appears to be a restricted internal version of the symbol table and I'm guessing the + 1 is that they didn't bother saving index 0 of it since that one isn't printable. Or possibly they are using a 1-indexed table instead of 0-indexed as is custom in C.

    The C standard dictates this for all ctype.h functions:

    In all cases the argument is an int, the value of which shall be representable as an unsigned char or shall equal the value of the macro EOF

    Going through the code step by step:

    • int iscntrl(int _c) The int types are really characters, but all ctype.h functions are required to handle EOF, so they must be int.
    • The check against -1 is a check against EOF, since it has the value -1.
    • _ctype+1 is pointer arithmetic to get an address of an array item.
    • [(unsigned char)_c] is simply an array access of that array, where the cast is there to enforce the standard requirement of the parameter being representable as unsigned char. Note that char can actually hold a negative value, so this is defensive programming. The result of the [] array access is a single character from their internal symbol table.
    • The & masking is there to get a certain group of characters from the symbol table. Apparently all characters with bit 5 set (mask 0x20) are control characters. There's no making sense of this without viewing the table.
    • Anything with bit 5 set will return the value masked with 0x20, which is a non-zero value. This sates the requirement of the function returning non-zero in case of boolean true.