Is it safe to use
ch >= '\0' && ch <=' '
as a condition that detects ASCII whitespace? (I am ignoring characters like non-breaking space.)
I am thinking of sequences like 0x8? 0x20
, which then would be considered a whitespace, though the first character indicates that the sequence has not ended.
All UTF-8 bytes in a multi-byte sequence will have their highest bits set, so no byte in the range of 0x00 - 0x20 can be a part of such sequence. The only bytes that do not have the highest bit set are the stand-alone bytes that represent the first 128 characters of the US-ASCII table.
Therefore, it is safe.