Search code examples
c++c-stringsstrcpyreinterpret-castunsigned-char

Is It Legal to Cast Away the Sign on a Pointer?


I am working in an antiquated code base that used unsigned char*s to contain strings. For my functionality I've used strings however there is a rub:

I can't use anything in #include <cstring> in the old code. Copying from a string to an unsigned char* is a laborious process of:

unsigned char foo[12];
string bar{"Lorem Ipsum"};

transform(bar.cbegin(), bar.cbegin() + min(sizeof(foo) / sizeof(foo[0]), bar.size()), foo, [](auto i){return static_cast<unsigned char>(i);});
foo[sizeof(foo) / sizeof(foo[0]) - 1] = '\0';

Am I going to get into undefined behavior or aliasing problems if I just do:

strncpy(reinterpret_cast<char*>(foo), bar.c_str(), sizeof(foo) / sizeof(foo[0]) - 1);
foo[sizeof(foo) / sizeof(foo[0]) - 1] = '\0';

Solution

  • There is an explicit exception to the strict aliasing rule for [unsigned] char, so casting pointers between character types will just work.

    Specifically in N3690 [basic.types] says that any trivially copyable object can be copied into an array of char or unsigned char, and if then copied back the value is identical. It also says if you copy the same array into a second object, the two objects are identical. (Paragraphs two and three)

    [basic.lval] says it is legal to change an object through an lvalue of char or unsigned char type.

    The concern expressed by BobTFish in the comments about whether values in char and unsigned char is misplaced I think. "Character" values are inherently of char type. You can store them in unsigned char and use them as char later - but that was happening already.

    (I'd recommend writing a few in-line wrapper functions to make the whole thing less noisy, but I assume the code snippets were for exposition rather than actual usage.)

    Edit: Remove erroneous recommendation to use static_cast.

    Edit2: Chapter and verse.