Search code examples
c++c++17bytestd-byte

A byte type: std::byte vs std::uint8_t vs unsigned char vs char vs std::bitset<8>


C++ has a lot of types that vaguely describe the same thing. Assuming that we are compiling for an architecture where a byte is 8-bit, all of the following types are vaguely similar:

  • std::byte
  • std::uint8_t
  • std::bitset<8>
  • unsigned char (8-bit)
  • char (8-bit)

If a byte is 8-bit, are all these types more or less interchangeable? If not, when would one need to be used instead of another?

I often see questions like Converting a hex string to a byte array on Stack Overflow where someone uses std::uint8_t, char, unsigned char and other types to represent a "byte". Is this just a matter of stylistic preference?


Note: This Q&A is intended to be a community FAQ, and edits are encouraged. The question of when to use what type for a "byte" and why comes up all the time, despite C++17 having introduced std::byte which seemingly makes the choice obvious. Having an FAQ that addresses all the misconceptions about std::bitset, std::uint8_t, etc. being a "byte" is useful. Edits are encouraged.


Solution

  • For 8-bit architectures, all the listed types are vaguely similar in the sense that they model something that has 8 bits. However, the use cases are fundamentally different, and only some of these types are guaranteed special properties that make them usable as a byte type.

    Overview

    Type Definition Purpose
    std::byte enum class byte : unsigned char {}; the canonical byte type
    ✔️ all special properties
    unsigned char fundamental type character / legacy byte type / small arithmetic type
    ✔️ all special properties
    signed char fundamental type character / small arithmetic type
    ❌ no special properties
    char fundamental type, same underlying type as
    signed char or unsigned char
    a character
    ⚠️ only some special properties
    char8_t fundamental type with
    underlying type unsigned char
    UTF-8 character
    ❌ no special properties
    std::uint8_t typedef unsigned char uint8_t;
    (This is not guaranteed, just the most
    common implementation.)
    8-bit unsigned arithmetic
    ⚠️ special properties not guaranteed
    std::bitset<8> template <std::size_t N>
    class bitset;
    set of 8 bits; might be wider than 8 bits
    ❌ no special properties

    See the appendix at the end of the question for a list of all these special properties, type by type.

    std::byte(C++17)

    This is the canonical byte type in C++. Whenever you have to ask yourself the question "Which type should I use to represent these bytes?", std::byte is the answer.

    Note that std::byte is very special because there are many relaxations that allow you to use the type in otherwise undefined ways. For example, the strict aliasing rule is relaxed for std::byte ([basic.lval] p11), meaning that you can examine any object as an array of std::bytes.

    Most other types don't have these special powers, and attempting to use them as a byte would be undefined behavior.

    As appropriate as std::byte is for raw memory operations, many older APIs such as the <iostream> library predate it and aren't designed around it. The type is also somewhat clunky (e.g. my_byte == 0 is not possible). Don't attempt to forcefully use it with libraries that weren't designed for std::byte.


    See also: Is there 'byte' data type in C++?, What is the purpose of std::byte?, P0298 - A byte type definition*

    unsigned char

    This is the closest thing to a "byte" there is prior to C++17. unsigned char has all the special properties that a std::byte has.

    However, the name is very confusing and it's also treated as a character in some contexts. For example, std::ostream::operator<< prints it as an ASCII character, instead of printing its numeric value. Also, doing arithmetic with unsigned char promotes it to int before any operation, which seems inappropriate for a "byte".

    All in all, it's a wishy washy type that is simultaneously a byte, a character, and an arithmetic type. Prefer std::byte, char, std::uint8_t, or std::uint_least8_t instead.


    See also: How to use new std::byte type in places where old-style unsigned char is needed?

    signed char

    The signed counterpart to unsigned char is similarly confused. It has almost none of the special properties that std::byte and unsigned char have, and is a strange mix of arithmetic and character type. It should also be avoided.

    A better alternative is std::int_least8_t which is also signed, and also guaranteed to be at least 8 bits wide, but which doesn't have a weird connotation of also being a character.


    See also: Difference between signed / unsigned char

    char

    This is a distinct type which has the same underlying type as signed char or unsigned char. It has most (but not all) of the special properties of unsigned char and std::byte. For example, unlike unsigned char, it does not provide storage ([intro.object] p3) for objects created in a char[].

    char should be used for what the name says: a character.


    See also: char!=(signed char), char!=(unsigned char)

    char8_t(C++20)

    There was originally some discussion about this type having special properties akin to char, but it ended up having none. Its underlying type is unsigned char, but it unlike std::byte, this doesn't mean that it inherits any properties from it.

    It should be used as a UTF-8 character, possibly within a UTF-8 encoded string.

    std::uint8_t(C++11)

    This type is a design mistake that has started in C. While this isn't guaranteed, it is usually implemented as type alias like

    typedef unsigned char uint8_t;
    

    This means that it has the special properties that unsigned char has in practice (since all compilers implement it like this), but none of this is guaranteed by the standard. The fact that it can alias every other type can also make it detrimental to performance, compared to if it was an alias for a unique type.

    One thing to note is that a byte isn't guaranteed to be 8 bits in C++. Many people use std::uint8_t because it offers a perceived safety of really being 8 bits. However, std::uint8_t is optional and doesn't exist on platforms where a byte is wider than 8 bits, so it is no more portable than:

    #include <climits>
    static_assert(CHAR_BIT == 8); // ... and use unsigned char or char as a byte type
    

    For a more portable 8-bit arithmetic type, there are std::uint_fast8_t and std::uint_fast8_t, which are guaranteed to exist but may be wider than 8 bits.

    Note that std::uint8_t, std::uint_least8_t, and std::uint_fast8_t may all be promoted to int, just like unsigned char.


    See also: uint8_t vs unsigned char, What platforms have something other than 8-bit char?

    std::bitset<8>

    This is the furthest from "byte" type. It models sequence of bits, or a set of numbers depending on perspective.

    A std:bitset<8> is at least as large as int in most implementations, so it isn't even 8 bits large. Only use this type for what the name says: a set of bits. It is not a byte.

    Conclusion

    std::byte is the only type which models a byte, nothing more, nothing less. It should be preferred as a byte type whenever possible. All other types are either missing crucial properties or have a fundamentally different purpose than being a byte.

    Appendix

    Special properties of std::byte and ordinary character types

    Section Affected Types Special Properties
    [intro.object] p3 unsigned char[], std::byte[] array provides storage for objects placed inside
    [intro.object] p13 unsigned char[], std::byte[] array implicitly creates objects inside when its lifetime begins
    [basic.life] p6.4 cv char*, cv unsigned char*, and cv std::byte* static_cast of pointers to objects outside lifetime is allowed
    [basic.indet] unsigned ordinary character types, std::byte indeterminate results allowed when initializating and assigning
    [basic.types.general] p2 char[], unsigned char[], std::byte[] trivially copyable objects can have their value transferred via an array
    [basic.lval] p11.3 char, unsigned char, std::byte relaxed strict aliasing
    [expr.new] p16 char[], unsigned char[], std::byte[] stricter alignment in a new-expression
    [bit.cast] p2 unsigned ordinary character types, std::byte indeterminate results allowed for std::bit_cast

    Note: it's unclear what unsigned ordinary character type actually means. See Editorial Issue 5070.