Search code examples
c++castingc++11reinterpret-cast

Is reinterpret_cast bad when dealing with low-level byte manipulation?


I'm writing a websocket server and I have to deal with masked data that I need to unmask.

The mask is unsigned char[4], and the data is a unsigned char* buffer as well.

I don't want to XOR byte by byte, I'd much rather XOR 4-bytes at a time.

uint32_t * const end = reinterpret_cast<uint32_t *>(data_+length);
for(uint32_t *i = reinterpret_cast<uint32_t *>(data_); i != end; ++i) {
    *i ^= mask_;
}

Is there anything wrong with the use of reinterpret_cast in this situation?

The alternative would be the following code which isn't as clear and not as fast:

uint64_t j = 0;
uint8_t *end = data_+length;
for(uint8_t *i = data_; i != end; ++i,++j) {
    *i ^= mask_[j % 4];
}

I'm all ears for alternatives, including ones dependent on c++11 features.


Solution

  • The are a few potential problems with the approach posted:

    1. On some systems objects of a type bigger than char needs to be aligned properly to be accessible. A typical requirement for uint32_t is that the object is aligned to an address divisible by four.
    2. If length / sizeof(uint32_t) != 0 the loop may never terminate.
    3. Depending on the endianess of the system mask needs to contain different values. If mask is produced by *reinterpret_cast<uint32_t>(char_mask) of a suitable array this shouldn't be an array.

    If these issues are taken care of, reinterpret_cast<...>(...) can be used in the situation you have. Reinterpreting the meaning of pointers is one of the reasons this operation is there and sometimes it is needed. I would create a suitable test case to verify that it works properly, though, to avoid having to hunt down problems when porting the code to a different platform.

    Personally I would go with a different approach until profiling shows that it is too slow:

    char* it(data);
    if (4 < length) {
        for (char* end(data + length - 4); it < end; it += 4) {
            it[0] ^= mask_[0];
            it[1] ^= mask_[1];
            it[2] ^= mask_[2];
            it[3] ^= mask_[3];
        }
    }
    it != data + length && *it++ ^= mask_[0];
    it != data + length && *it++ ^= mask_[1];
    it != data + length && *it++ ^= mask_[2];
    it != data + length && *it++ ^= mask_[3];
    

    I'm definitely using a number of similar approaches in software which meant to be really faster and haven't found them to be a notable performance problem.