Search code examples

Copying big endian float data directly into a vector<float> and byte swapping in place. Is it safe?

I'd like to be able to copy big endian float arrays directly from an unaligned network buffer into a std::vector<float> and perform the byte swapping back to host order "in place", without involving an intermediate std::vector<uint32_t>. Is this even safe? I'm worried that the big endian float data may accidentally be interpreted as NaNs and trigger unexpected behavior. Is this a valid concern?

For the purposes of this question, assume that the host machine receiving the data is little endian.

Here's some code that demonstrates what I'm trying to do:

std::vector<float> source{1.0f, 2.0f, 3.0f, 4.0f};
std::size_t number_count = source.size();

// Simulate big-endian float values being received from network and stored
// in byte buffer. A temporary uint32_t vector is used to transform the
// source data to network byte order (big endian) before being copied
// to a byte buffer.
std::vector<uint32_t> temp(number_count, 0);
std::size_t byte_length = number_count * sizeof(float);
std::memcpy(,, byte_length);
for (uint32_t& datum: temp)
    datum = ::htonl(datum);
std::vector<uint8_t> buffer(byte_length, 0);
std::memcpy(,, byte_length);
// buffer now contains the big endian float data, and is not aligned at word boundaries

// Copy the received network buffer data directly into the destination float vector
std::vector<float> numbers(number_count, 0.0f);
std::memcpy(,, byte_length); // IS THIS SAFE??

// Perform the byte swap back to host order (little endian) in place,
// to avoid needing to allocate an intermediate uint32_t vector.
auto ptr = reinterpret_cast<uint8_t*>(;
for (size_t i=0; i<number_count; ++i)
    // IS THIS SAFE??
    uint32_t datum;
    std::memcpy(&datum, ptr, sizeof(datum));
    *datum = ::ntohl(*datum);
    std::memcpy(ptr, &datum, sizeof(datum));
    ptr += sizeof(datum);

assert(numbers == source);

Note the two "IS THIS SAFE??" comments above.

Motivation: I'm writing a CBOR serialization library with support for typed arrays. CBOR allows typed arrays to be transmitted as either big endian or little endian.

EDIT: Replaced illegal reinterpret_cast<uint32_t*> type punning in endian swap loop with memcpy.


  • After your edit:

    Regarding the auto datum = reinterpret_cast<uint32_t*>(;: This is not allowed in C++, one can only safely type-pun to uint8_t (only if CHAR_BIT == 8, more precisely this type-punning exception only holds for the char types)

    Old answer: Below is for the question before the edit (the one with bit_cast).

    This is safe, provided sizeof(float) == sizeof(uint32_t)

    Dont worry about signaling NaNs. The exceptions are usually disabled, and even if they are enabled, they are only happening when a signaling NaN is generated. The move instructions do not generate exceptions.

    Accessing the vector elements via data() pointer is supported (for both reading and writing). vector is guarantueed to have a contiguous storage.

    But why aren't you doing all in only a single loop without the temp buffers?

    Just have the float vector (input or output) and the data buffer (uint8_t vector). For sending just iterate over the float input vector, for each element perform the byte swapping and write the 4 bytes to the data buffer. One at a time. Then you do not need any intermediate buffers. It will probably not be slower. For receiving do the reverse.

    Use std::bit_cast for conversion of float from/to std::array<uint8_t,4>. This would be the "correct" way in C++20 (you cant use C arrays directly with bit_cast). With this approach you do not need to invoke ntohl, just copy the bytes in correct order from/to buffer.