Task
At the moment I am porting old DOS code for a device to Linux in pure C. The text is drawn on the surface with the help of bitfonts. I wrote a function which needs the Unicode codepoint to be passed and then draws the corresponding glyph (tested and works with different ASCII and non-ASCII characters). The old source code used DOS encoding but I am trying to use UTF-8 since multilanguage support is desired. I cannot use SDL_ttf or similar functions since the produced glyphs are not "precise" enough. Therefore I have to stick with bitfonts.
Issue
I wrote a small C test program to test the conversion of multibyte characters to their corresponding Unicode codepoint (inspired by http://en.cppreference.com/w/c/string/multibyte/mbrtowc).
#include <stdio.h>
#include <locale.h>
#include <string.h>
#include <wchar.h>
#include <stdint.h>
int main(void)
{
size_t n = 0, x = 0;
setlocale(LC_CTYPE, "en_US.utf8");
mbstate_t state = {0};
char in[] = "!°水"; // or u8"zß水"
size_t in_sz = sizeof(in) / sizeof (*in);
printf("Processing %zu UTF-8 code units: [ ", in_sz);
for(n = 0; n < in_sz; ++n)
{
printf("%#x ", (unsigned char)in[n]);
}
puts("]");
wchar_t out[in_sz];
char* p_in = in, *end = in + in_sz;
wchar_t *p_out = out;
int rc = 0;
while((rc = mbrtowc(p_out, p_in, end - p_in, &state)) > 0)
{
p_in += rc;
p_out += 1;
}
size_t out_sz = p_out - out + 1;
printf("into %zu wchar_t units: [ ", out_sz);
for(x = 0; x < out_sz; ++x)
{
printf("%u ", (unsigned short)out[x]);
}
puts("]");
}
The output is as expected:
Processing 7 UTF-8 code units: [ 0x21 0xc2 0xb0 0xe6 0xb0 0xb4 0 ]
into 4 wchar_t units: [ 33 176 27700 0 ]
When I run this code on my embedded Linux device I get the following as output:
Processing 7 UTF-8 code units: [ 0x21 0xc2 0xb0 0xe6 0xb0 0xb4 0 ]
into 2 wchar_t units: [ 33 55264 ]
After the !
character the mbrtowc output is -1, which, according to the documentation, occurs when an encoding error happened. I tested it with different signs and this error occurs only with non-ASCII characters. Error never occurred on Linux computer
Additional Information
I am using a PFM-540I Rev. B as pc on the embedded device. The Linux distribution is built using Buildroot.
You need to make sure that the en_US.utf8
locale is available on the embedded Linux build. By default, Buildroot limits the locales installed on the system in two ways:
BR2_GENERATE_LOCALE
configure option. By default, this list is empty, so you only get the C locale. Set this config option to en_US.UTF-8
.BR2_ENABLE_LOCALE_WHITELIST
. en_US
is already in the default value, so probably you don't need to change this.Note that if you change these configuration options, you need to make a completely clean build (with make clean; make
) for the change to take effect.