Search code examples
c++undefined-behavior

C++: is it undefined behaviour to create with a bitwise copy of an object using a C-style cast to char[] (and back)?


I find this question hard to precisely phrase. I have come across a bizarre situation in a large project of mine where I need to work with lambdas that capture by value. In short, I have something like the following (and variations thereof):

struct my_data_t
{
    std::vector<double> d_data;
    std::vector<int>    i_data;
};

The way I end up operating on these data structures is through lambdas. Let's say I have a lambda that sets the i-th element based on a std::pair:

const auto my_lam = [&](const int i, const std::pair<double, int>& elem)
{
    data.d_data[i] = elem.first;
    data.i_data[i] = elem.second;
};

I can clearly then just do the following:

my_data_t data{{1.0, 1.3, 1.9, 9.3}, {1, 4, 3, 5}};

const auto my_lam = [&](const int i, const std::pair<double, int> elem)
{
    data.d_data[i] = elem.first;
    data.i_data[i] = elem.second;
};

my_lam(1, {100.3, 15});

This works fine, and when I print the contents of the vectors, I see the values I set. However, for hardware reasons that are too complicated/involved to get into here, I am strictly forbidden from capturing data by reference. This is an absolute limit.

It goes without saying that

my_data_t data{{1.0, 1.3, 1.9, 9.3}, {1, 4, 3, 5}};

auto my_lam = [=](const int i, const std::pair<double, int> elem) mutable
{
    data.d_data[i] = elem.first;
    data.i_data[i] = elem.second;
};

my_lam(1, {100.3, 15});

does not work for me for obvious reasons.

The current solution in my codebase is a "view-like" implementation, e.g.

template <typename data_t> struct vec_image_t
{
    data_t* raw;
    std::size_t c_size;
    // indexing operators, etc
};

and then modifying my_data_t to be something like

template <template <typename> container_t = std::vector> struct my_data_t
{
    container_t<double> d_data;
    container_t<int>    i_data;
};

This works well, but suffers the issue that I need to write an awful lot of boilerplate code to convert a my_data_t<std::vector> to a my_data_t<vec_image_t>, meaning this approach does not scale well (I have many variations of my_data_t). It is also a complete headache when const comes into the mix.

I tried to dream up some solutions to this problem when I realized that all I really need is a my_data_t (as shown in the first implementation without the template template) whose copy constructor and destructor are never called. I looked to c++20's bit_cast for this and it seems to be what I want, but it requires that the cast type be trivially copyable, and in the case of e.g. std::vector, this is not satisfied.

I came up with my own cast, much like bit_cast, just without the trival requirement:

template <typename thing_t>
requires(sizeof(thing_t) == sizeof(image_t<thing_t>))
image_t<thing_t> make_image(thing_t& thing)
{
    image_t<thing_t> output;
    std::memcpy(&output.raw[0], &thing, output.cpy_size);
    return output;
}

where image_t<thing_t> looks like

template <typename thing_t> struct image_t
{
    constexpr static std::size_t cpy_size = sizeof(thing_t);
    char raw[cpy_size];
    
    thing_t* operator -> ()
    {
        return (thing_t*)(&raw[0]);
    }
    
    thing_t& operator * ()
    {
        return *(thing_t*)(&raw[0]);
    }
     // and const versions...
};

This works perfectly for my small problem:

my_data_t data{{1.0, 1.3, 1.9, 9.3}, {1, 4, 3, 5}};
auto d_img = make_image(data);
auto my_lam = [=](const int i, const std::pair<double, int> elem) mutable
{
    d_img->d_data[i] = elem.first;
    d_img->i_data[i] = elem.second;
};

my_lam(1, {100.3, 15});

For my larger codebase, this solution would allow me to strip an awful lot of complexity out and make my users' lives easier. It also has the advantage that the image_t<thing_t> acts an awful lot like thing_t*, which fits with the semantics of use. I understand that storing image_t<thing_t> is dangerous, but I see no further danger than storing iterators.

My question is whether or not this is undefined behaviour. I worry that this is UB for obvious reasons, but according to the standard, under "Type aliasing":

Whenever an attempt is made to read or modify the stored value of an object of type DynamicType through a glvalue of type AliasedType, the behavior is undefined unless one of the following is true:

  • ...
  • AliasedType is ... char ...: this permits examination of the object representation of any object as an array of bytes.

This appears to absolve my make_image cast of any UB. Is this the case?


Solution

  • I looked to c++20's bit_cast for this and it seems to be what I want, but it requires that the cast type be trivially copyable, and in the case of e.g. std::vector, this is not satisfied.

    This restriction exists because copying the object representation (whether by bit_cast or memcpy) between objects of any other type has undefined behavior.

    So yes, UB.