I have this code:
static int main(string[] args) {
info(escape_latex(args[1]));
return 0;
}
string escape_latex(string input) {
var builder = new StringBuilder.sized(input.length + 20);
var map = new Gee.HashMap<string, string>();
// ...<Snip>...
// Fix for some weird unicode bugs
map["\xff\xbf\xbf\xbf\xbf\xbf"] = "";
info("Len: %d", input.char_count());
for(var i = 0; i < input.char_count(); i++) {
var ic = input.get_char(i);
var as_string = ic.to_string();
info("%d %s", i, as_string);
if(map.has_key(as_string)) {
builder.append(map[as_string]);
} else {
builder.append_unichar(ic);
}
}
return builder.str;
}
If I pass "foo123", I get the expected output "foo123". But If I pass "Geldbeutel+Schlüsselanhänger", I get the output "Geldbeutel+Schl?sselanh?ng" (Last two chars are missing).
Now I changed the for-loop to for(var i = 0; i <= input.char_count(); i++) {
For "foo123", I get the expected output, for "Geldbeutel+Schlüsselanhänger", I get "Geldbeutel+Schl?sselanh?nge". (Valgrind, ASAN and UBSAN don't show anything).
Now I change the for-loop to for(var i = 0; i <= input.char_count() + 1; i++) {
"foo123" is now foo123G
, as I run over into other memory, but "Geldbeutel+Schlüsselanhänger" give the correct output "Geldbeutel+Schl?sselAnh?nger"
For the last example input, an example output:
** INFO: 19:41:57.903: a.vala:23: Len: 28
** INFO: 19:41:57.903: a.vala:29: 0 G
** INFO: 19:41:57.903: a.vala:29: 1 e
** INFO: 19:41:57.903: a.vala:29: 2 l
** INFO: 19:41:57.903: a.vala:29: 3 d
** INFO: 19:41:57.903: a.vala:29: 4 b
** INFO: 19:41:57.903: a.vala:29: 5 e
** INFO: 19:41:57.903: a.vala:29: 6 u
** INFO: 19:41:57.903: a.vala:29: 7 t
** INFO: 19:41:57.903: a.vala:29: 8 e
** INFO: 19:41:57.903: a.vala:29: 9 l
** INFO: 19:41:57.903: a.vala:29: 10 +
** INFO: 19:41:57.903: a.vala:29: 11 S
** INFO: 19:41:57.903: a.vala:29: 12 c
** INFO: 19:41:57.903: a.vala:29: 13 h
** INFO: 19:41:57.903: a.vala:29: 14 l
** INFO: 19:41:57.903: a.vala:29: 15 ?
** INFO: 19:41:57.903: a.vala:29: 17 s
** INFO: 19:41:57.903: a.vala:29: 18 s
** INFO: 19:41:57.903: a.vala:29: 19 e
** INFO: 19:41:57.903: a.vala:29: 20 l
** INFO: 19:41:57.903: a.vala:29: 21 a
** INFO: 19:41:57.903: a.vala:29: 22 n
** INFO: 19:41:57.903: a.vala:29: 23 h
** INFO: 19:41:57.903: a.vala:29: 24 ?
** INFO: 19:41:57.903: a.vala:29: 26 n
** INFO: 19:41:57.903: a.vala:29: 27 g
** INFO: 19:41:57.903: a.vala:29: 28 e
** INFO: 19:41:57.903: a.vala:29: 29 r // <- Here, I access an invalid index, but it works
** INFO: 19:41:57.903: a.vala:2: Geldbeutel+Schl?sselanh?nger
It seems to be related to unicode, but I can't find a way to make this function work.
It is to do with the locale and the default for the C runtime environment is US ASCII. You can set it to the user preferred locale for the runtime environment by passing an empty string to Intl.setlocale()
for LocaleCategory.ALL
, which are also the default parameter values, so Intl.setlocale();
will work:
static int main(string[] args) {
Intl.setlocale();
print(escape_latex(args[1]) + "\n");
return 0;
}
string escape_latex(string input) {
var builder = new StringBuilder.sized(input.length + 20);
var map = new Gee.HashMap<string, string>();
// ...<Snip>...
// Fix for some weird unicode bugs
map["\xff\xbf\xbf\xbf\xbf\xbf"] = "";
info("Len: %d", input.char_count());
for(var i = 0; i < input.char_count(); i++) {
var ic = input.get_char(i);
var as_string = ic.to_string();
info("%d %s", i, as_string);
if(map.has_key(as_string)) {
builder.append(map[as_string]);
} else {
builder.append_unichar(ic);
}
}
return builder.str;
}