Search code examples
clinuxasciinon-ascii-characters

How to compress Non-ASCII characters to 1 byte in C for Linux?


I have a list of Turkish words. I need to compare their lengths. But since some Turkish characters are non-ASCII, I can't compare their lengths correctly. Non-ASCII Turkish characters holds 2 bytes.

For example:

#include <stdio.h>
#include <string.h>

int main()
{
    char s1[] = "ab";
    char s2[] = "çş";

    printf("%d\n", strlen(s1)); // it prints 2
    printf("%d\n", strlen(s2)); // it prints 4

    return 0;
}

My friend said it's possible to do that in Windows with the line of code below:

system("chcp 1254");

He said that it fills the Turkish chars to the extended ASCII table. However it doesn't work in Linux.

Is there a way to do that in Linux?


Solution

  • One possibility could be to use wide character strings to store words. It does not store characters as one byte but it solves your main problem. To get a set of functions working with your language. The program would look like the following:

    #include <stdio.h>
    #include <string.h>
    #include <wchar.h>
    
    int main()
    {
        wchar_t s1[] = L"ab";
        wchar_t s2[] = L"çş";
    
        printf("%d\n", wcslen(s1)); // it prints 2
        printf("%d\n", wcslen(s2)); // it prints 2
    
        return 0;
    }