Search code examples
gccutf-8character-encodingutf-16

c-file encoded in utf-16 is not read properly by gcc


Doing some encoding tests, I saved a c-file with encoding 'UTF-16 LE' (using sublimeText).

The c file contains the following:

#include <stdio.h>

void main() {
    char* letter = "é";
    printf("%s\n", letter);
}

Compiling this file with gcc returns the error:

test.c:1:3: error: invalid preprocessing directive #i; did you mean #if?
    1 | # i n c l u d e   < s t d i o . h >

It's as if gcc inserted a space before each character when reading the c-file.

My question is: Can we submit c-files encoded in some format other than "utf-8" ? Why it was not possible for gcc to detect the encoding of my file and read it properly ?


Solution

  • Because design choice.

    From GNU Manual, Character-sets:

    At present, GNU CPP does not implement conversion from arbitrary file encodings to the source character set. Use of any encoding other than plain ASCII or UTF-8, except in comments, will cause errors. Use of encodings that are not strict supersets of ASCII, such as Shift JIS, may cause errors even if non-ASCII characters appear only in comments. We plan to fix this in the near future.

    GCC is born to create GNU, so from Unix world, where UTF16 is not an allowed character set (for standard files, and GNU pass sources files between different programs, e.g. CPP the preprocessor, GCC the compiler, etc.).

    But also, who uses UTF16 for sources? And for C, which hates all the \0 in strings? The encoding of source code has nothing to do with the program (and do default locales for reading files, printing strings, etc.).

    If it cause problem, just use a pre-preprocessor (which is not so uncommon), to change your source code in gcc useable code (but hidden to you, so you can continue edit in UTF16).