I am working in an antiquated code base that used unsigned char*
s to contain strings. For my functionality I've used string
s however there is a rub:
I can't use anything in #include <cstring>
in the old code. Copying from a string
to an unsigned char*
is a laborious process of:
unsigned char foo[12];
string bar{"Lorem Ipsum"};
transform(bar.cbegin(), bar.cbegin() + min(sizeof(foo) / sizeof(foo[0]), bar.size()), foo, [](auto i){return static_cast<unsigned char>(i);});
foo[sizeof(foo) / sizeof(foo[0]) - 1] = '\0';
Am I going to get into undefined behavior or aliasing problems if I just do:
strncpy(reinterpret_cast<char*>(foo), bar.c_str(), sizeof(foo) / sizeof(foo[0]) - 1);
foo[sizeof(foo) / sizeof(foo[0]) - 1] = '\0';
There is an explicit exception to the strict aliasing rule for [unsigned] char
, so casting pointers between character types will just work.
Specifically in N3690 [basic.types] says that any trivially copyable object can be copied into an array of char
or unsigned char
, and if then copied back the value is identical. It also says if you copy the same array into a second object, the two objects are identical. (Paragraphs two and three)
[basic.lval] says it is legal to change an object through an lvalue of char
or unsigned char
type.
The concern expressed by BobTFish in the comments about whether values in char
and unsigned char
is misplaced I think. "Character" values are inherently of char
type. You can store them in unsigned char
and use them as char
later - but that was happening already.
(I'd recommend writing a few in-line wrapper functions to make the whole thing less noisy, but I assume the code snippets were for exposition rather than actual usage.)
Edit: Remove erroneous recommendation to use static_cast
.
Edit2: Chapter and verse.