I am making a hash that will allow you to lookup the description you see below by feeding it a QString containing its character.
I got a full list of the relevant data, looking something like this:
QHash<QString, QString> lookupCharacterDescription;
...
lookupCharacterDescription.insert("003F","QUESTION MARK");
lookupCharacterDescription.insert("0040","COMMERCIAL AT");
lookupCharacterDescription.insert("0041","LATIN CAPITAL LETTER A");
lookupCharacterDescription.insert("0042","LATIN CAPITAL LETTER B");
...
lookupCharacterDescription.insert("1F648","SEE-NO-EVIL MONKEY");
lookupCharacterDescription.insert("1F649","HEAR-NO-EVIL MONKEY");
lookupCharacterDescription.insert("1F64A","SPEAK-NO-EVIL MONKEY");
lookupCharacterDescription.insert("1F64B","HAPPY PERSON RAISING ONE HAND");
...
lookupCharacterDescription.insert("FFFD","REPLACEMENT CHARACTER");
lookupCharacterDescription.insert("FFFE","<not a character>");
lookupCharacterDescription.insert("FFFF","<not a character>");
lookupCharacterDescription.insert("FFFFE","<not a character>");
lookupCharacterDescription.insert("FFFFF","<not a character>");
Now obviously "1F64B"
needs to be wrapped in something here. I have tried playing around with things like 0x1F64B
as a QChar, but I am honestly groping in the dark here. I could make it work with the lower values like the Latin Letters, but it fails with the 5 character addresses.
1F64B
? When you use QString(0x1F64B)
it'll call QString::QString(QChar ch)
. Since QChar
is a 16-bit type, it'll truncate the value to 0xF64B and you get an invalid character since that code point is currently unassigned. I'm pretty sure you'll get an out-of-range warning at that line. You can see the value F64B
easily in the character ļ
if you zoom in or use a hex editor. Since there's no way for 0x1F64B to fit into a single 16-bit QChar and must be represented by a surrogate pair, you can't initialize the string that way.
OTOH QString("š")
works since it's constructing the string from another string. You must construct the string with a string like that, or manually by assigning the UTF-8/16 code units.
Is this considered UTF-32?
No. UTF-32 is a Unicode encoding that uses 32 bits for a code unit. You only have QString and not a bare byte array, so you don't need to care about its underlying encoding (which is actually UTF-16)
What can I wrap this value "1F64B" in to produce the QString("š")?
You shouldn't deal with the numeric values as string. Store it as a numeric type instead
QHash<qint32, QString> lookupCharacterDescription;
lookupCharacterDescription.insert(0x1F64B, "HAPPY PERSON RAISING ONE HAND");
and then to make a string that contains the character at code point 0x1F64B use
uint cp = 0x1F64B;
QString mystr = QString::fromUcs4(&cp, 1);
Will the wrappings also work for the lower values?
Yes, since UCS4, A.K.A. UTF-32, can store any possible Unicode characters
Alternatively you can construct the character from UTF-16 or UTF-8. U+1F64B is encoded in UTF-16 as D83D DE4B
, or as F0 9F 99 8B
in UTF-8, therefore you can use any of the below
QChar utf16[2] = { 0xD38D, 0xDE4B };
str1 = QString(utf16, 2);
char* utf8[4] = { 0xF0, 0x9F, 0x99, 0x8B };
str2 = QString::fromUtf8(utf8, 4);
If you want to include the string in its literal form in source code then either of the following will work
str1 = QString::fromWCharArray(L"\xD83D\xDE4B");
str2 = QString::fromUtf8("\xF0\x9F\x99\x8B");
If you have C++11 support then simply use the prefix u8
, u
and U
for UTF-8, UTF-16 and UTF-32 respectively like this
QString::fromUtf8(u8"š");
QString::fromUtf16(u"š");
QString::fromUtf16(u"\uD83D\uDE4B");
QString::fromUtf16(u"\U0001F64B");
QString::fromUcs4(U"š");
QString::fromUcs4(U"\U0001F64B");
QString::fromUcs4(U"š", 1);
QString::fromUcs4(U"\U0001F64B", 1);
Mandatory article to understand text and encodings: There Ain't No Such Thing as Plain Text