Search code examples
assemblyunicodeasciimipsutf-16

How to convert UTF-16 to and from ASCII


I'm writing a subroutine in MIPS assembly language to convert ASCII into UTF-16 and vice versa. However, I could not find any trick how to convert it.


Solution

  • Pseudocode, assuming that your bytes are octets and that no zero termination is required:

    Conversion from ASCII to UTF-16

    1. Given an ASCII input string of length n (in bytes) stored sequentially in memory at address p.
    2. Allocate 2 × n bytes of memory; let the start address of that memory be q.
    3. While n is larger than zero:
      1. Check whether the byte at p is a valid ASCII character. If you don't use checksumming, the most significant bit has to be zero, otherwise it has to be the correct checksum. Issue an error if the byte is not valid.
      2. Zero-extend the byte at p to the 16-bit word at q. How this is done depends on the instruction set; e.g., x86 has MOVZX. You may also pay attention to the correct endianness.
      3. Increment p by 1.
      4. Increment q by 2.
      5. Decrement n by 1.

    Lossless conversion from UTF-16 to ASCII

    1. Given an UTF-16 input string of length n (in code units) stored sequentially in memory at address p.
    2. Allocate n bytes of memory; let the start address of that memory be q.
    3. While n is larger than zero:
      1. Check whether the 16-bit word at p represents a valid ASCII character. The nine most significant bits have to be zero, otherwise the character is not representable in ASCII. Issue an error if the word is not valid.
      2. Move the least significant byte of the 16-bit word at p to the byte at q.
      3. If required, add a checksum to the byte at q.
      4. Increment p by 2.
      5. Increment q by 1.
      6. Decrement n by 1.