Search code examples
c++arduinoarduino-unoarduino-idearduino-c++

How to convert hex UTF-16 char to string in arduino


I'm working on a function to correctly display Arabic words on the display/LCD. (Arabic letters have four different modes.) I have an array of Arabic letters (Map array) in different states. After recognizing the alphabet in Arabic, I need to re-align the letters. My question is how do I put the Unicode characters through the table (Map Table) to a String variable (pBuffer)?

For example: To write the word باب you need to select the letter from Map table and place it in a String to send to the display/LCD.

...
const unsigned char Map[][5] PROGMEM = {

     /* code, isolated, initial, medial, final */
    {0x0621, 0xFE80, 0x0000, 0x0000, 0x0000 },  //1 /* HAMZA ء*/
    {0x0622, 0xFE81, 0x0000, 0x0000, 0xFE82 },  //2/* ALEF_MADDA آ*/
    {0x0623, 0xFE83, 0x0000, 0x0000, 0xFE84 },  //3/* ALEF_HAMZA_ABOVE أ*/
    {0x0624, 0xFE85, 0x0000, 0x0000, 0xFE86 },  //4/* WAW_HAMZA ؤ*/
    {0x0625, 0xFE87, 0x0000, 0x0000, 0xFE88 },  //5/* ALEF_HAMZA_BELOW إ*/
    {0x0626, 0xFE89, 0xFE8B, 0xFE8C, 0xFE8A },  //6/* YEH_HAMZA ئ*/
    {0x0627, 0xFE8D, 0x0000, 0x0000, 0xFE8E },  //7/* ALEF ا*/
    {0x0628, 0xFE8F, 0xFE91, 0xFE92, 0xFE90 }   //8/* BEH ب*/
};

String pBuffer;
pBuffer += ((char)(Map[7][4]));
pBuffer += ((char)(Map[6][6]));
pBuffer += ((char)(Map[7][3]));

u8g2.setCursor(5, 20);
u8g2.print(pBuffer);
...

Unfortunately the above method used does not work. How do I select characters from the "Map" table above and put them together in a String variable?


Solution

  • First, I must suggest that you look up the UTF-8 values for those Arabic characters. Arduino and u8g2 both support UTF-8 encoding but not UTF-16. It is much more straightforward to solve this problem when starting with an array of UTF-8 values.

    For UTF-8 characters, the compiler can convert code points inside string literals for you:

    String character = u8"\u0628"; // ب
    

    Internally that string will contain two bytes that represent "ب" in UTF-8. A single char is not enough storage space for any Arabic character in UTF-8 or UTF-16, so you must use an array (char*) or a String.

    The Arduino IDE allows you to just write Unicode character literals directly in the code too:

    String character = "ب";
    

    As long as the source code is saved in UTF-8 format, the string will be the same as above with the u8"\u0628" value.

    You could rewrite your character map to use a String, and then just type in the Arabic characters literally or using the code point method: (using accented latin characters for example here)

    const String Map[][5] PROGMEM = {
      {"a", "à", "á", "A", "Á"},
      {"e", "è", "é", "E", "É"}
    };
    

    Of course the String will use more than 2 bytes to store those characters, so you can save space by storing the characters as 16-bit integers, but you will have to do some conversion before-hand.

    A Unicode code point is not actually the binary representation that you will see in the character buffer. U+0628 = ب but the actual binary representation is 0xD8A8. That this the value you should store in the Map, not the codepoint (0x0628) like you have already.

    const uint16_t Map[][5] PROGMEM = {
      {0xD8A8, ..., ..., ...},
      ...
    };
    

    If you use String for your map, you can easily build strings from it:

    String output = Map[i][2] + Map[j][3] + Map[k][4];
    

    If you use uint16_t, then you have to split the integer value into two bytes to add a character:

    uint16_t v = Map[i][j];  // v = 0xD8A8 for example
    char lo = v & 0xFF;   // The D8 part of 0xD8A8, lo = 0xD8
    char hi = v >> 8;     // The A8 part of 0xD8A8, hi = 0xA8
    String output = String(hi) + String(lo); // output = {0xA8, 0xD8}
    

    Lastly, you will have to convert the String to a char* buffer for using with U8g2::drawUTF8(). You can do that by using output.c_str() to get the underlying char* buffer.