Search code examples
ccharascii

How can I print a 128-255 character from an integer in C?


I'm still playing around with C to understand how it works.

I'm having trouble printing characters from the extended ASCII table (128-255). If I do printf("Â") (for example), it prints  (everything works fine). However, if I assign a variable, for instance a = 194, and then print the variable printf("%c",a), it prints � instead of Â.

By the way it works fine with the 32-127 characters (for example, 35 prints #).

How could I print one of the 128-255 characters from an integer (decimal or binary)?

I am using GCC 11.3 on Ubuntu 20.04.1 LTS.


Solution

  • It is likely both your compiler and the terminal use UTF-8 to encode non-ASCII characters.

    Character sets and encodings is a vast subject, with many different and incompatible conventions and implementations. Â indeed is encoded as 194 on legacy single byte encodings such as ISO8859-1 and Windows-1252. It is also the character number in the Unicode standard, that has more than 100000 different code points to represent almost every language and symbol set in the world.

    There are different ways to represent these characters as sequences of bytes, the most ubiquitous of which is UTF-8, used in 99% of web pages. ASCII characters in the range 32-127 are represented as single bytes, and those with a greater code-point use between 2 and 4 bytes, a leading byte in the range C2 to F4 and 1 to 3 trailing bytes in the range 80 to BF. Â is encoded as C3 82, which means "Â" is actually a 2 byte string identical to "\xC3\x82".

    You can verify this with this code:

    #include <stdio.h>
    #include <string.h>
    
    int main(void) {
        const char *s = "Â";
        int len = strlen(s);
        printf("%s: len=%d, bytes=", s, len);
        for (int i = 0; i < len; i++) {
            printf("%02hhX%c", s[i], " \n"[i == len - 1]);
        }
        return 0;
    }
    

    The output should be Â: len=2, bytes=C3 82.

    To convert non-ASCII characters to UTF-8 sequences on output streams, you can use the locale functions from <locale.h> and wide character output:

        setlocale(LC_ALL, "en_US.UTF-8");
        printf("%lc\n", 194);
    

    Output:

    Â
    

    If the locale is correctly configured in the terminal, you can select the default locale with setlocale(LC_ALL, "");