Search code examples
unicodeencodingutf-7

Maximum size in bytes of single 16 bit character in UTF-7 representation


What would be maximum size in bytes of single UTF-16 character (2-byte character i.e. char type in .NET) saved in UTF-7 format?

This is what I've found on Wikipedia:

5 for an isolated case inside a run of single byte characters. For runs 2 2⁄3 per character plus padding to make it a whole number of bytes plus two to start and finish the run

http://en.wikipedia.org/wiki/Comparison_of_Unicode_encodings#Seven-bit_environments


Solution

  • For a single UTF-16 code unit, the only number you need to pay attention to in there is 5.

    Essentially, in UTF-7, characters not within its "safe" alphabet are converted to UTF-16 and then that is converted to modified Base64. With a single UTF-16 code unit, it is converted into 2 2/3 Base64 units then padded to a full 3. An escape character is added at the beginning and possibly the end to signify it as a UTF-7 sequence, resulting in a maximum of 5 bytes.