I have a function that takes in an enum and a uint16_t value. The enum represents the type, and each type may either be of length 1 or 2 bytes. The function then prepares a packet to be sent over the network, and the packet size may either be 2 or 3 bytes, depending on the length of the data. The data needs to be converted to network byte order before sending.
In order to handle this, I have the following code. The real code is more defensive and I tried to simplify as much as I can for the purposes of this question.
bool set_config(param_type type, uint16_t value)
{
size_t len = get_param_len(type); // returns either 1 or 2
if(len == 1 && value > 255){
return false;
}
// serializing the data to a packet
union{
uint8_t byte;
uint16_t ushort;
uint8_t bytes[2];
}data;
if(len == 2){
data.ushort = value;
data.ushort = htons(data.ushort)
}else{
data.byte = value;
}
uint8_t packet[3] = {0};
packet[0] = type;
// does this make sense? Or is this undefined behaviour?
memcpy(&packet[1], &data.bytes[0], len);
return send_packet(packet, 1 + len); // sends the packet which may be 2 or 3 bytes
}
Is it a good idea to copy the data to the packet in this manner? Internally in the union, how is the data stored? Is the byte parameter in the data union always the same as bytes[0]? Or is this compiler specific?
In short, the answer is no, its not a safe.
A simple change made to a compiler setting or environmental differences between platforms could result in the data being interpreted incorrectly by the receiving application.
As you know the the memory size of a union is equivalent to its largest member. In simplistic terms the union defined as:
union{
uint8_t byte;
uint16_t ushort;
uint8_t bytes[2];
}data;
Will take up a minimum of 2 bytes, but the order of byte packing and how the bytes are packed is determined by compiler settings such as whether multi-byte values are ordered by least or most significant byte first and also microprocessor architecture.
For example:
The two bytes for byte
and ushort
could be packed as follows:
byte | byte| |
byte | | byte|
ushort | MSB | LSB |
ushort | LSB | MSB |
As you can see the value for byte
may be stored in the first or the second byte, similarly the data stored for ushort
may appear to be reversed with the Most Significant Byte appearing first in one example and the Least Significant Byte appearing first in the other.
Each of the above examples may be determined by the compiler and its settings.
To make matter worse some microprocessors will rearrange bytes dependent on their architecture for example when looking at uint32_t
.
Instead of uint32_t
being stored as |byte0|byte1|byte2|byte3|
it may be stored as |byte1|byte0|byte3|byte2|
.
If your union changed to
union{
uint8_t byte;
uint16_t ushort;
sytuct{
uint8_t byte[3];
}multiByte;
};
The matters become even more complex as now you will most probably have data alignment issues. On a 16 bit processor the 3 bytes present within the multiByte
structure will be placed upon a 16 bit boundary causing the union to take up 4 bytes and not 3 bytes of data by default.
So if you are reliant on compiler settings or architecture to ensure that your data packing is consistent then the project isn't supportable long term and it may not be portable without change.
Hence for safety it is best to be pedantic and process the data to ensure that it is in the correct order both for transmission and on reception.