On Windows, wchar_t
is a UTF-16(LE) formatted character, which is -- for the most part -- equivalent to char16_t
. However, these two character types are still distinct types in the C++ type-system -- which makes me uncertain whether converting between sequences of these two character types is legal as per the C++ standard.
My question is this: In C++17, is it legal to perform the following casts, and to read from the converted pointers:
reinterpret_cast<const wchar_t*>(char16_ptr)
where decltype(char16_ptr)
is const char16_t*
, andreinterpret_cast<const char16_t*>(wchar_ptr)
where decltype(wchar_ptr)
is const wchar_t*
For the purposes of this question, assume the following:
sizeof(wchar_t) == sizeof(char16_t)
, andwchar_t
is formatted the same as char16_t
(as is the case on Windows)Basically, is this a violation of a strict-aliasing?
My understanding that the cast itself is valid thanks to [expr.reinterpret.cast]/7
, but that the result of the cast cannot safely be used since the type is being aliased by something that isn't char
, unsigned char
, or std::byte
. Is this interpretation correct?
Note: Other questions have been asked regarding wchar_t
and char16_t
being the same, but this question is not a duplicate of those as far as I can tell. Notably, the question "Are wchar_t and char16_t the same on Windows?" actually performs a reinterpret_cast
between pointers, but none of the answers actually address whether this cast was ever legal in the first place.
You already know the answer to this: strictly speaking, no.
wchar_t
is not char16_t
. Neither derives from the other. Neither is similar to the other. Neither is a signed/unsigned version of the other. Neither is an aggregate containing the other.And neither of them is a bytewise type (char
, etc).
So you cannot access a wchar_t
through a pointer/reference to a char16_t
.
If strict avoidance of strict aliasing is your goal, you're going to have to copy the data to a different object. That is valid, assuming they both have the same representation.