I'm writing a Node.js frontend for a DAB development board, which will eventually run on a Raspberry Pi. I am a Java and web developer, and I'm struggling with C++ and converting between different types of strings.
The DAB board comes with a C++ SDK, with a number of handy functions. It allows me to get the number of available programs with GetTotalProgram()
. For each program I can call GetProgramName
to get the program's name:
GetProgramName(char mode, long dabIndex, char namemode, wchar_t * programName)
... where mode
means FM
or DAB
, namemode
means long or short name. The program´s name will be returned in programName
.
In order to convert the wchar_t *programName
into a v8::String
, I found this snippet that I'm using, and understand the basics of:
wchar_t buff[300];
char cbuff[600];
GetProgramName(0, i, 1, buff);
wcstombs( cbuff, buff, wcslen(buff) );
Local<String> str = String::NewFromUtf8(isolate, (const char *) cbuff, v8::String::kNormalString, wcslen(buff));
I iterate through the available programs and build up a v8::Array
:
void GetPrograms(const FunctionCallbackInfo<Value>& args) {
Isolate* isolate = Isolate::GetCurrent();
HandleScope scope(isolate);
wchar_t buff[300];
char cbuff[600];
int numberOfPrograms, i;
numberOfPrograms = GetTotalProgram();
Local<v8::Array> ARRAY = Array::New(isolate, totalprogram);
for (i = 0; i < numberOfPrograms; i++) {
if (GetProgramName(0, i, 1, buff)) {
wcstombs( cbuff, buff, wcslen(buff) );
Local<String> str = String::NewFromUtf8(isolate, (const char *) cbuff, v8::String::kNormalString, wcslen(buff));
Local<Object> obj = Object::New(isolate);
obj->Set(String::NewFromUtf8(isolate, "name"), str);
ARRAY->Set(i, obj);
}
}
args.GetReturnValue().Set(ARRAY);
}
I call the C++ method from my Node app:
var programs = ext.getPrograms();
for (var i = 0; i < programs.length; i++) {
console.log(programs[i][name]);
}
This mostly works, but when the program's name contains a non ASCII-character, like Æ
, Ø
, Å
, the next elements in ARRAY has a borked name.
Here's what the Node snippet actually outputs (console.log
), compared to the expected output:
| ACTUAL | EXPECTED |
| --------- | ---------- |
| NRK SUPER | NRK SUPER |
| NRK VUPER | NRK VÆR |
| NRK P1 ER | NRK P1 |
It seems as though the non-ASCII character causes the next wcstombs
to quit early, not copying the later characters.
Why does this happen? Is there a better way to create a v8::String
from my wchar_t
?
Note:
I have now been able to isolate this problem down to the wcstombs
method when running on the Raspberry Pi. The following code:
#include <stdio.h>
#include <string>
#include <cstring>
#include <cstdlib>
char cbuff[600];
wchar_t buff[300] = L"ABCø123abc";
int main( int argc, const char* argv[] ) {
wcstombs( cbuff, buff, wcslen(buff) );
wprintf(L"wcslen of wchar_t array: %u - strlen of char array: %u\n", (char) wcslen(buff), strlen(cbuff));
}
when run on a Mac, outputs
wcslen of wchar_t array: 10 - strlen of char array: 10
,
but when run on the Raspberry, outputs
wcslen of wchar_t array: 10 - strlen of char array: 3
- that is, it counts only characters before the ø
character
This looks similar to this unanswered question.
The problem was in the wcstombs( cbuff, buff, wcslen(buff) )
call, which would stop copying characters when it encountered a non-ASCII character. The docs say The behavior of this function depends on the LC_CTYPE category of the selected C locale.
So setting the locale to a UTF-8 variant solved the problem:
setlocale(LC_CTYPE, "C.UTF-8");
Having done this, I can now create v8::String
s this way:
wchar_t buff[300] = L"Something non-ASCII ÆØÅ here";
char cbuff[600];
wcstombs( cbuff, buff, wcslen(buff) );
Local<String> str = String::NewFromUtf8(isolate, (const char *) cbuff, v8::String::kNormalString, wcslen(buff));