Search code examples
cwindowslocale

Setlocale not working


I'm trying to use the setlocale function so I can use Portuguese characters in the Windows console.

This is my code:

#include <stdio.h>
#include <stdlib.h>
#include <locale.h>

int main()
{
    setlocale(LC_ALL, "Portuguese");
    printf("Bem-Vindo ao CALCULADORA SIMULATOR 2018 - FOSÓRIO EDITION\n");
}

But this is my output:

Bem-Vindo ao CALCULADORA SIMULATOR 2018 - FOSÃ"RIO EDITION

Every other text that is written in cmd is shown correctly, only my program's output has this problem.

And looks like any different character is changed to "Ã" plus another random character. For example: printf("áàóõãÃ\n"); Outputs this:

áà óõãÃ


Solution

  • First, setlocale takes ISO 639-1 language codes, not the full names of languages in English (plus suffixes that let you, for instance, distinguish Brazilian from European Portuguese; the complete syntax is documented in MSDN under "Locale Names, Languages, and Country/Region Strings").

    Second, the output you got, with a string of accented letters Óáàóõãà each becoming a two-character sequence starting with Ã, is a characteristic mojibake pattern for UTF-8 being misinterpreted as Windows-1252. UTF-8 is a variable-length encoding for Unicode "codepoints", in which the accented characters you're trying to use each become two-byte sequences; Windows-1252 is a fixed-length encoding, so each of those pairs of bytes is misinterpreted as two characters. Here's how it happens for those specific characters:

    character   codepoint    UTF-8 two-byte sequence   Windows-1252
    ---------   ---------    -----------------------   ------------
    Ó           U+00D3       0xC3 0x93                 Ã “
    á           U+00E1       0xC3 0xA1                 Ã ¡
    à           U+00E0       0xC3 0xA0                 Ã □
    ó           U+00F3       0xC3 0xB3                 Ã ³
    õ           U+00F5       0xC3 0xB5                 Ã µ
    ã           U+00E3       0xC3 0xA3                 Ã £
    Ã           U+00C3       0xC3 0x83                 Ã ƒ
    

    (the white square on the à line is standing in for a non-breaking space)

    This is a typical way for "narrow" text output to be mangled by the Windows console. Windows uses UTF-16 for almost everything internally, which means it often works better to use C's "wide character" library. Try this program instead:

    #include <wchar.h>
    #include <locale.h>
    
    int main(void)
    {
        setlocale(LC_ALL, "pt"); // also try "pt_BR"
        wprintf(L"Bem-Vindo ao CALCULADORA SIMULATOR 2018 - FOSÓRIO EDITION\n");
    }
    

    Note: Most other operating systems were slower to take the Unicode plunge than Windows was, and came to it only after it had become obvious that UTF-8 was a better choice than UTF-16, which means the "wide character" library should be avoided on all operating systems except Windows. Don't worry about this until you need to write a program that works on both Windows and non-Windows.