Search code examples
c++c++17std-byte

std::byte on odd platforms


Reading Herb Sutter's blog post about the most recent C++ standard meeting, it noticed that std::byte was added to C++17. As an initial reading, I have some concerns since it uses unsigned char so that it can avoid complications with strict aliasing rules.

My biggest concern is, how does it work on platforms where CHAR_BIT is not 8? I have worked on/with platforms where CHAR_BIT is 16 or 32 (generally DSPs). Given that std::byte is for dealing with "byte-oriented access to memory", and most people understand byte to indicate an octet (not the size of the underlying character type), how will this work for individuals who expect that this will address contiguous 8-bit chunks of memory?

I already see people who just assume that CHAR_BIT is 8 (not evening knowing that CHAR_BIT exists...). A type called std::byte is likely to introduce even more confusion to individuals.


I guess that what I expected was that they were introducing a type to permit consistent addressing/access to sequential octets for all cases. There are many octet-oriented protocols where it would be useful to have a library or type that is guaranteed to access memory one octet at a time on all platforms, no matter what CHAR_BIT is equal to on the given platform.

I can definitely understand wanting to have it well specified that something is being used as a sequence of bytes rather than a sequence of characters, but it doesn't seem like being as useful as many other things might be.


Solution

  • Given that std::byte is for dealing with "byte-oriented access to memory", and most people understand byte to indicate an octet (not the size of the underlying character type), how will this work for individuals who expect that this will address contiguous 8-bit chunks of memory?

    You can't understand something wrong and then expect the world to rearrange itself to fit your expectations.

    The reason why most people think a byte and an octet are the same thing is because in most cases it is true. The vast majority of your typical computer has CHAR_BIT == 8. That doesn't mean it is true all the time.

    • A byte is not an octet.
    • char, signed char and unsigned char have a size of one byte.

    The good side though is that, people who don't know that, are actually people who don't need to know. If you're working on a machine where a byte is made of more than an octet you are the kind of developer who needs to know that more than any other one.

    If we're talking theory here, then the answer is simple: just learn that a byte is different than an octet. If we're talking concrete stuff, then the answer is that you either know the difference already or you won't need to know it (hopefully :)). The worst case is you learning this painfully, but that's the third minority group of developers working on exotic platforms without exotic knowledge.


    If you want an equivalent for octets, it already exists:

    Note that they are "provided only if the implementation directly supports the type".