I'm working on a function to correctly display Arabic words on the display/LCD. (Arabic letters have four different modes.)
I have an array of Arabic letters (Map array
) in different states.
After recognizing the alphabet in Arabic, I need to re-align the letters.
My question is how do I put the Unicode characters through the table (Map Table
) to a String variable
(pBuffer)?
For example: To write the word باب
you need to select the letter from Map table
and place it in a String
to send to the display/LCD.
...
const unsigned char Map[][5] PROGMEM = {
/* code, isolated, initial, medial, final */
{0x0621, 0xFE80, 0x0000, 0x0000, 0x0000 }, //1 /* HAMZA ء*/
{0x0622, 0xFE81, 0x0000, 0x0000, 0xFE82 }, //2/* ALEF_MADDA آ*/
{0x0623, 0xFE83, 0x0000, 0x0000, 0xFE84 }, //3/* ALEF_HAMZA_ABOVE أ*/
{0x0624, 0xFE85, 0x0000, 0x0000, 0xFE86 }, //4/* WAW_HAMZA ؤ*/
{0x0625, 0xFE87, 0x0000, 0x0000, 0xFE88 }, //5/* ALEF_HAMZA_BELOW إ*/
{0x0626, 0xFE89, 0xFE8B, 0xFE8C, 0xFE8A }, //6/* YEH_HAMZA ئ*/
{0x0627, 0xFE8D, 0x0000, 0x0000, 0xFE8E }, //7/* ALEF ا*/
{0x0628, 0xFE8F, 0xFE91, 0xFE92, 0xFE90 } //8/* BEH ب*/
};
String pBuffer;
pBuffer += ((char)(Map[7][4]));
pBuffer += ((char)(Map[6][6]));
pBuffer += ((char)(Map[7][3]));
u8g2.setCursor(5, 20);
u8g2.print(pBuffer);
...
Unfortunately the above method used does not work.
How do I select characters from the "Map" table above and put them together in a String
variable?
First, I must suggest that you look up the UTF-8 values for those Arabic characters. Arduino and u8g2 both support UTF-8 encoding but not UTF-16. It is much more straightforward to solve this problem when starting with an array of UTF-8 values.
For UTF-8 characters, the compiler can convert code points inside string literals for you:
String character = u8"\u0628"; // ب
Internally that string will contain two bytes that represent "ب" in UTF-8.
A single char
is not enough storage space for any Arabic character in UTF-8 or UTF-16, so you must use an array (char*
) or a String
.
The Arduino IDE allows you to just write Unicode character literals directly in the code too:
String character = "ب";
As long as the source code is saved in UTF-8 format, the string will be the same as above with the u8"\u0628"
value.
You could rewrite your character map to use a String
, and then just type in the Arabic characters literally or using the code point method: (using accented latin characters for example here)
const String Map[][5] PROGMEM = {
{"a", "à", "á", "A", "Á"},
{"e", "è", "é", "E", "É"}
};
Of course the String
will use more than 2 bytes to store those characters, so you can save space by storing the characters as 16-bit integers, but you will have to do some conversion before-hand.
A Unicode code point is not actually the binary representation that you will see in the character buffer. U+0628 = ب
but the actual binary representation is 0xD8A8
. That this the value you should store in the Map, not the codepoint (0x0628
) like you have already.
const uint16_t Map[][5] PROGMEM = {
{0xD8A8, ..., ..., ...},
...
};
If you use String
for your map, you can easily build strings from it:
String output = Map[i][2] + Map[j][3] + Map[k][4];
If you use uint16_t
, then you have to split the integer value into two bytes to add a character:
uint16_t v = Map[i][j]; // v = 0xD8A8 for example
char lo = v & 0xFF; // The D8 part of 0xD8A8, lo = 0xD8
char hi = v >> 8; // The A8 part of 0xD8A8, hi = 0xA8
String output = String(hi) + String(lo); // output = {0xA8, 0xD8}
Lastly, you will have to convert the String
to a char*
buffer for using with U8g2::drawUTF8()
. You can do that by using output.c_str()
to get the underlying char*
buffer.