I am writing a class which lets me convert between bytes and the various integer data types. Instead of reversing arrays and then converting data, I have opted to determine if the endianness of the system is the same as the data. If it is, I simply map the data to the integer, like this in the case of a 64-bit integer:
result = (long)(
(buffer[index] << 56) |
(buffer[index + 1] << 48) |
(buffer[index + 2] << 40) |
(buffer[index + 3] << 32) |
(buffer[index + 4] << 24) |
(buffer[index + 5] << 16) |
(buffer[index + 6] << 8) |
(buffer[index + 7]));
And if the endianness of the system and data differ, it would be reversed like so:
result = (long)(
(buffer[index]) |
(buffer[index + 1] << 8) |
(buffer[index + 2] << 16) |
(buffer[index + 3] << 24) |
(buffer[index + 4] << 32) |
(buffer[index + 5] << 40) |
(buffer[index + 6] << 48) |
(buffer[index + 7] << 56));
result
is a 64-bit signed integer
buffer
is a byte array
index
is a 32-bit signed integer indicating the position in the buffer to begin reading
My question is... am I doing this wrong or is this just a really simple way to do the conversion without having to reverse the array in place or make copies?
This seems like it should work for all combinations of system and data endianness and convert between the two correctly.
Is there perhaps a different way that may be easier to read or generally more simple?
There are two main scenarios when converting between integers and their byte representation:
This is typically the case when interoperating with native code. Use code that naturally uses native endianness, such as Buffer.BlockCopy
, BitConverter.ToBytes
/ToInt64
and unsafe code. In some cases the p/invoke marshaller can do most of the work for you.
This is typically the case when parsing files or network protocols. In that case your code pieces (minus the casting bug) are the ideal way to handle it. Give them a name that mentions the endianness, such as ToInt64BitEndian
.
They are easy to understand, easy to test (don't depend on system endianness) and reasonably fast.
Occasionally it can give a performance boost to use Buffer.BlockCopy
or unsafe reinterpret casting, but I'd only use those after profiling that indicates a bottleneck in this code. In my programs this has never been a bottle-neck, so I use code pretty similar to your examples.
I don't like reversing based code for this, since the code path for big-endian systems won't get exercised on a typical little-endian system.
ErrataRob's code review of silent circle makes a similar point, elaborating a bit more:
Protocol parsing is CPU independent. There is never a reason to do something different depending upon the CPU.
Casting and byte-swapping
The mistake of doing an
#if
conditional above comes from trying to fix an underlying mistake of casting betweenchar*
andint*
. This is a common technique taught in your “UNIX Network Programming” class. It’s also wrong. You should never do it when parsing packets.There are two reasons to avoid this. The first is that (as mentioned above) some CPUs, such as SPARC and some versions of ARM crash when referencing unaligned integers. This makes network code unstable on RISC systems, because most integers are usually aligned anyway, meaning a lot of alignment issues escape undetected into shipping code. The only way to make stable code is to stop casting integers in network (or file) parsers.
The second problem is that it causes confusion with byte-order/endianess that doesn’t happen if you just don’t cast integers. Consider the IP address “10.1.2.3”. There are only two forms for this number, either an integer with the value of
0x0a010203
, or an array of bytes with the value 0a 01 02 03. The problem is that little endian machines are weird. The integer0x0a010203
is represented internally as03 02 01 0a
on x86 processors, with the order of bytes “swapped”.But this is just an internal detail that YOU NEVER NEED TO WORRY ABOUT. As long as you never cross the streams and cast from a
char*
to anint*
(or the reverse), then the byte-order/endianness never matters.