Search code examples
sizecharacter-encodingcharacterbandwidthdelimiter

Do certain characters take more bytes than others?


I'm not very experienced with lower level things such as howmany bytes a character is. I tried finding out if one character equals one byte, but without success.

I need to set a delimiter used for socket connections between a server and clients. This delimiter has to be as small (in bytes) as possible, to minimize bandwidth.

The current delimiter is "#". Would getting an other delimiter decrease my bandwidth?


Solution

  • It depends on what character encoding you use to translate between characters and bytes (which are not at all the same thing):

    • In ASCII or ISO 8859, each character is represented by one byte
    • In UTF-32, each character is represented by 4 bytes
    • In UTF-8, each character uses between 1 and 4 bytes
    • In ISO 2022, it's much more complicated

    US-ASCII characters (of whcich # is one) will take only 1 byte in UTF-8, which is the most popular encoding that allows multibyte characters.