Search code examples
unicodeendiannessutf-16

How to convert between UTF-16 LE <-> BE?


If I want to convert UTF-16 BE <-> LE, what should I consider? Can I treat them just a plain 2-byte integer array? Or should I follow some special Unicode algorithm to handle some exceptional case?


Solution

  • You just need to byte-reorder the code units, thus taking two bytes, swapping them and writing them back. That is all there is to consider.

    But usually there is a simple way of reading a stream in one encoding and writing it back in another encoding. Often with negligible performance drawbacks (especially in the UTF-16 case). So to make your code clearer you should probably opt for such a solution. But the trivial way should work regardless if you know the input encoding precisely.