Search code examples
cgtkglib

Force UTF-8 encoding in glib's "g_print()"


Short question: Is there a way to force glib's g_print() to use UTF-8 encoding?


The problem I hit is that g_print() seems to do character set conversion based on the return value of g_get_charset(). Now the documentation unfortunately mentions

On Windows the character set returned by this function is the so-called system default ANSI code-page.

However nowadays modern consoles are available: MSYS consoles typically support (and use) UTF-8 by default and even the default Windows console can be set to use UTF-8 encoding.

Now it seems Windows has finally catched up and glib is limiting me to a 255 character code page after all???
I'd simply switch my code to plain printf but unfortunately g_print is called in many locations from inside glib and gtk libraries, as well as their C++ bindings glibmm and gtkmm and I obviously have no easy possibility to change anything about that short of patching and compiling glib myself, so I really hope there's a solution to that.


Note: Just saw the part calling local_glib_print_func() in the definition of g_print(). Aynbody knows what this is about and if I could exploit it for my purposes?


Solution

  • Well, in fact I gave myself the correct hint:

    While investigating the Note in my question I discovered the function g_set_print_handler which allows to create an arbitrary handler that replaces the default mechanism and also circumvents the character conversion.

    The following minimal print handler let's me print to the console with g_print() avoiding any unwanted character set conversion:

    #include <cstdio>
    #include <glib.h>
    
    void g_print_no_convert(const gchar *buf)
    {
        fputs(buf, stdout);
    }
    
    int main (int argc, char **argv)
    {
        g_set_print_handler(g_print_no_convert);
        g_print("UTF-8 string");
    
        return 0;
    }
    

    Note: Writing UTF-8 strings obviously only works if your console's encoding is in fact UTF-8.


    On Windows you can set your console's encoding to UTF-8 either manually by executing the command chcp 65001 or programmatically with the following API functions

    #include <windows.h>
    
    // temporarily switch console encoding to UTF8
    const unsigned int initial_cp = GetConsoleOutputCP();
    SetConsoleOutputCP(CP_UTF8);
    
    {...} // printing
    
    // switch back to initial console encoding
    SetConsoleOutputCP(initial_cp);
    

    This approach easily allows to print UTF-8 strings to the Windows console (tested with the default console as well as MSYS2's terminal on Windows 10).