Search code examples
ccastingtypedef

How to have warning when casting `int_least8_t` to `char`?


I am building a string library to support both ascii and utf8.
I create two typedef for t_ascii and t_utf8. ascii is safe to be read as utf8, but utf8 is not safe to be read as ascii.
Do I have any way to issue a warning when implicitely casting from t_utf8 to t_ascii, but not when implicitely casting t_ascii to t_utf8 ?

Ideally, I would want these warnings (and only these warnings) to be issued:

#include <stdint.h>

typedef char           t_ascii;
typedef uint_least8_t  t_utf8;

int main()
{
    t_ascii const* asciistr = "Hello world"; // Ok
    t_utf8 const*   utf8str = "你好世界";    // Ok

    asciistr = utf8str; // Warning: utf8 to ascii is not safe
    utf8str = asciistr; // Ok: ascii to utf8 is safe

    t_ascii asciichar = 'A';
    t_utf8   utf8char = 'B';

    asciichar = utf8char; // Warning: utf8 to ascii is not safe
    utf8char = asciichar; // Ok: ascii to utf8 is safe
}

Currently, when building with -Wall (and even with -funsigned-char), I get these warnings:

gcc main.c -Wall -Wextra                          
main.c: In function ‘main’:
main.c:10:35: warning: pointer targets in initialization of ‘const t_utf8 *’ {aka ‘const unsigned char *’} from ‘char *’ differ in signedness [-Wpointer-sign]
   10 |         t_utf8 const*   utf8str = "你好世界";    // Ok
      |                                   ^~~~~~~~~~
main.c:12:18: warning: pointer targets in assignment from ‘const t_utf8 *’ {aka ‘const unsigned char *’} to ‘const t_ascii *’ {aka ‘const char *’} differ in signedness [-Wpointer-sign]
   12 |         asciistr = utf8str; // Warning: utf8 to ascii is not safe
      |                  ^
main.c:16:17: warning: pointer targets in assignment from ‘const t_ascii *’ {aka ‘const char *’} to ‘const t_utf8 *’ {aka ‘const unsigned char *’} differ in signedness [-Wpointer-sign]
   16 |         utf8str = asciistr; // Ok: ascii to utf8 is safe
      |                 ^

Solution

  • Compile with -Wall. Always compile with -Wall.

    <user>@squall:~/src/p1$ gcc -Wall -c test2.c
    test2.c: In function ‘main’:
    test2.c:9:31: warning: pointer targets in initialization of ‘const t_utf8 *’ {aka ‘const signed char *’} from ‘char *’ differ in signedness [-Wpointer-sign]
        9 |     t_utf8  const*  utf8str = "你好世界";
          |                               ^~~~~~~~~~~~~~
    test2.c:11:13: warning: pointer targets in assignment from ‘const t_ascii *’ {aka ‘const char *’} to ‘const t_utf8 *’ {aka ‘const signed char *’} differ in signedness [-Wpointer-sign]
       11 |     utf8str = asciistr; // Ok: ascii to utf8 is safe
          |             ^
    test2.c:12:14: warning: pointer targets in assignment from ‘const t_utf8 *’ {aka ‘const signed char *’} to ‘const t_ascii *’ {aka ‘const char *’} differ in signedness [-Wpointer-sign]
       12 |     asciistr = utf8str; // Should issue warning: utf8 to ascii is not safe
          |              ^
    

    You want it to be safe to cast from t_ascii from t_utf8, but it's simply not. The signedness differs.

    The warning is not about the fact that valid utf8 is sometimes not valid ASCII - the compiler knows nothing about that. The warning is about the sign.

    If you want an unsigned char, compile with -funsigned-char. But then neither warning will be issued.

    (By the way, if you think that type int_least8_t will be able to hold a multibyte char / complete utf8 codepoint encoding - it will not. All int_least8_t and consequently utf8_t in a single compilation unit will have the exact same size.)