C++ has a lot of types that vaguely describe the same thing. Assuming that we are compiling for an architecture where a byte is 8-bit, all of the following types are vaguely similar:
std::byte
std::uint8_t
std::bitset<8>
unsigned char
(8-bit)char
(8-bit)If a byte is 8-bit, are all these types more or less interchangeable? If not, when would one need to be used instead of another?
I often see questions like Converting a hex string to a byte array on Stack Overflow where someone uses std::uint8_t
, char
, unsigned char
and other types to represent a "byte". Is this just a matter of stylistic preference?
Note: This Q&A is intended to be a community FAQ, and edits are encouraged. The question of when to use what type for a "byte" and why comes up all the time, despite C++17 having introduced std::byte
which seemingly makes the choice obvious. Having an FAQ that addresses all the misconceptions about std::bitset
, std::uint8_t
, etc. being a "byte" is useful. Edits are encouraged.
For 8-bit architectures, all the listed types are vaguely similar in the sense that they model something that has 8 bits. However, the use cases are fundamentally different, and only some of these types are guaranteed special properties that make them usable as a byte type.
Type | Definition | Purpose |
---|---|---|
std::byte |
enum class byte : unsigned char {}; |
the canonical byte type ✔️ all special properties |
unsigned char |
fundamental type | character / legacy byte type / small arithmetic type ✔️ all special properties |
signed char |
fundamental type | character / small arithmetic type ❌ no special properties |
char |
fundamental type, same underlying type assigned char or unsigned char |
a character ⚠️ only some special properties |
char8_t |
fundamental type with underlying type unsigned char |
UTF-8 character ❌ no special properties |
std::uint8_t |
typedef unsigned char uint8_t; (This is not guaranteed, just the most common implementation.) |
8-bit unsigned arithmetic ⚠️ special properties not guaranteed |
std::bitset<8> |
template <std::size_t N> class bitset; |
set of 8 bits; might be wider than 8 bits ❌ no special properties |
See the appendix at the end of the question for a list of all these special properties, type by type.
std::byte
(C++17)This is the canonical byte type in C++. Whenever you have to ask yourself the question "Which type should I use to represent these bytes?", std::byte
is the answer.
Note that std::byte
is very special because there are many relaxations that allow you to use the type in otherwise undefined ways. For example, the strict aliasing rule is relaxed for std::byte
([basic.lval] p11), meaning that you can examine any object as an array of std::byte
s.
Most other types don't have these special powers, and attempting to use them as a byte would be undefined behavior.
As appropriate as std::byte
is for raw memory operations, many older APIs such as the <iostream>
library predate it and aren't designed around it.
The type is also somewhat clunky (e.g. my_byte == 0
is not possible).
Don't attempt to forcefully use it with libraries that weren't designed for std::byte
.
See also: Is there 'byte' data type in C++?, What is the purpose of std::byte?, P0298 - A byte type definition*
unsigned char
This is the closest thing to a "byte" there is prior to C++17.
unsigned char
has all the special properties that a std::byte
has.
However, the name is very confusing and it's also treated as a character in some contexts. For example, std::ostream::operator<<
prints it as an ASCII character, instead of printing its numeric value. Also, doing arithmetic with unsigned char
promotes it to int
before any operation, which seems inappropriate for a "byte".
All in all, it's a wishy washy type that is simultaneously a byte, a character, and an arithmetic type. Prefer std::byte
, char
, std::uint8_t
, or std::uint_least8_t
instead.
See also: How to use new std::byte type in places where old-style unsigned char is needed?
signed char
The signed counterpart to unsigned char
is similarly confused. It has almost none of the special properties that std::byte
and unsigned char
have, and is a strange mix of arithmetic and character type. It should also be avoided.
A better alternative is std::int_least8_t
which is also signed, and also guaranteed to be at least 8 bits wide, but which doesn't have a weird connotation of also being a character.
See also: Difference between signed / unsigned char
char
This is a distinct type which has the same underlying type as signed char
or unsigned char
.
It has most (but not all) of the special properties of unsigned char
and std::byte
.
For example, unlike unsigned char
, it does not provide storage ([intro.object] p3) for objects created in a char[]
.
char
should be used for what the name says: a character.
See also: char!=(signed char), char!=(unsigned char)
char8_t
(C++20)There was originally some discussion about this type having special properties akin to char
, but it ended up having none.
Its underlying type is unsigned char
, but it unlike std::byte
, this doesn't mean that it inherits any properties from it.
It should be used as a UTF-8 character, possibly within a UTF-8 encoded string.
std::uint8_t
(C++11)This type is a design mistake that has started in C. While this isn't guaranteed, it is usually implemented as type alias like
typedef unsigned char uint8_t;
This means that it has the special properties that unsigned char
has in practice (since all compilers implement it like this), but none of this is guaranteed by the standard.
The fact that it can alias every other type can also make it detrimental to performance, compared to if it was an alias for a unique type.
One thing to note is that a byte isn't guaranteed to be 8 bits in C++.
Many people use std::uint8_t
because it offers a perceived safety of really being 8 bits.
However, std::uint8_t
is optional and doesn't exist on platforms where a byte is wider than 8 bits, so it is no more portable than:
#include <climits>
static_assert(CHAR_BIT == 8); // ... and use unsigned char or char as a byte type
For a more portable 8-bit arithmetic type, there are std::uint_fast8_t
and std::uint_fast8_t
, which are guaranteed to exist but may be wider than 8 bits.
Note that std::uint8_t
, std::uint_least8_t
, and std::uint_fast8_t
may all be promoted to int
, just like unsigned char
.
See also: uint8_t vs unsigned char, What platforms have something other than 8-bit char?
std::bitset<8>
This is the furthest from "byte" type. It models sequence of bits, or a set of numbers depending on perspective.
A std:bitset<8>
is at least as large as int
in most implementations, so it isn't even 8 bits large. Only use this type for what the name says: a set of bits. It is not a byte.
std::byte
is the only type which models a byte, nothing more, nothing less. It should be preferred as a byte type whenever possible.
All other types are either missing crucial properties or have a fundamentally different purpose than being a byte.
std::byte
and ordinary character typesSection | Affected Types | Special Properties |
---|---|---|
[intro.object] p3 | unsigned char[] , std::byte[] |
array provides storage for objects placed inside |
[intro.object] p13 | unsigned char[] , std::byte[] |
array implicitly creates objects inside when its lifetime begins |
[basic.life] p6.4 | cv char* , cv unsigned char* , and cv std::byte* |
static_cast of pointers to objects outside lifetime is allowed |
[basic.indet] | unsigned ordinary character types, std::byte |
indeterminate results allowed when initializating and assigning |
[basic.types.general] p2 | char[] , unsigned char[] , std::byte[] |
trivially copyable objects can have their value transferred via an array |
[basic.lval] p11.3 | char , unsigned char , std::byte |
relaxed strict aliasing |
[expr.new] p16 | char[] , unsigned char[] , std::byte[] |
stricter alignment in a new-expression |
[bit.cast] p2 | unsigned ordinary character types, std::byte |
indeterminate results allowed for std::bit_cast |
Note: it's unclear what unsigned ordinary character type actually means. See Editorial Issue 5070.