Search code examples
c++structnetwork-programmingbit-fields

C++ struct bit field doesn't parse data correctly


I'm trying to extract fields from a VLAN Header using a packed struct:

I created this struct:

#pragma pack(push, 1)
struct vlan_header
{
    uint16_t PCP : 3,
             DEI : 1,
             ID : 12;
};
#pragma pack(pop)

When I take an uint8_t array and try to extract the fields from it:

uint8_t* data;
vlan_header* vlanHeader;
data = new uint8_t[2];
data[0] = 0;
data[1] = 0x14; // data is 00 14
                // That means PCP is 0, DEI is 0 and vlan id is 20
vlanHeader = (vlan_header*)data;
std::cout << "PCP: " << vlanHeader->PCP << std::endl;
std::cout << "DEI: " << vlanHeader->DEI << std::endl;
std::cout << "ID: " <<  vlanHeader->ID << std::endl;
delete[] data;

The output is:

PCP: 0
DEI: 0
ID: 320

Clearly, we see vlan id is 320 and not 20, which is not what I meant. I assume the problem is endianness (my machine is little endian) and I have no idea how to resolve the problem elegantly.

Maybe bit fields isn't the right tool for the job?


Solution

  • The OP asked this:

    I assume the problem is endianness (my machine is little endian) and I have no idea how to resolve the problem elegantly.

    Maybe bit fields isn't the right tool for the job?

    Although when working with bitfields or unions, taking the machine's endian into account is always an excellent consideration and something that shouldn't be forgotten. However in your current situation I do not see where endian is a cause or concern for any problems. As for the second part of the question, it all depends on the specific needs. If the code being written to is exclusively for a specific architecture/os/platform and isn't likely to be portable then there should be nothing wrong with using bitfields if they are constructed properly. Even if you are deciding to port to other machines you can still use bitfields but you have to be a lot more careful and may have to write more code with either preprocessor directives or control switch & case statements to have the code use and do one thing on one machine as opposed to another.

    When using bitfields I think endian comes into consideration when mixing types.

    struct Bitfield {
        unsigned a : 10,
                 b : 10,
                 c : 16;
        int      x : 10,
                 y : 10,
                 z : 16;
    };
    

    Something like the above would probably need to have endian taken into consideration.


    By looking at your bitfield structure what I'm seeing is a misunderstanding of the alignment of bits within the bitfield versus the alignment of the struct itself.

    With your current struct being:

    #pragma pack(push, 1)
    struct vlan_header {
        // uint16_t = 2bytes: - 16bits to work with
        uint16_t PCP : 3,  // bit(s) 0-2
                 DEI : 1,  // bit(s) 3
                 ID : 12;  // bit(s) 4-15
    };
    #pragma pack(pop)
    

    You are packing the alignment to the smallest possible size of 1 byte, so the boundaries within this struct are to be at 8 bits for each boundary. Not a big deal and pretty self explanatory. You are then using a type of uint16_t which is a typedef for unsigned short that is 2 bytes in size or 16 bits to work with. The values of an unsigned short range from [0,65535].

    Then within the struct you are setting the bitfield members PCP, DEI & ID to have number of bits: 3, 1, 12 respectively. I added comments to your struct to show this pattern.

    Now as in your main function you are declaring a pointer to a type of uint8_t, you then create an instance of your above struct, then you create dynamic memory for your pointer with an array size of [2]. Here uint8_t is a typedef for an unsigned char which is 1 byte in size or 8 bits to work with and since you have 2 of them you have 2 bytes or 16 bits in total. Okay so the total sizes of the memory matches between your bitfield struct and the data[] array.

    You then populate your pointer array by indexing and setting them with hex values. Then you assign the values from the array to your bitfield by casting it over to that type. However what I think you are assuming is that data[0] should work for the 1st 2 members of the bitfield and that data[1] should work for the last one the ID value. However this is not the case:

    What is happening here is that in this part of your code:

    data[0] = 0;
    data[1] = 0x14; // data is 00 14
    

    The above does not do what you are thinking it should be doing.


    I'll make a chart just to show you examples: However it is too large to display here; so what I can do is provide you with a bit of code to run on your machine to produce a log file for you to see the patterns.

    #include <iostream>
    #include <fstream>
    
    #pragma pack(push, 1)
    struct vlan_header {
        // uint16_t = 2bytes: - 16bits to work with
    
        uint16_t PCP : 3,  // bit(s) 0-2
                 DEI : 1,  // bit(s) 3
                 ID : 12;  // bit(s) 4-15
    };
    #pragma pack(pop)
    
    int main() {            
        uint8_t* data; // sizeof(uint8_t) = 1byte - 8bits           
        vlan_header* vlanHeader;
        data = new uint8_t[2];
    
        std::ofstream log;
        log.open( "results.txt" );
        for ( unsigned i = 0; i < 256; i++ ) {
            for ( unsigned j = 0; j < 256; j++ ) {
                data[0] = j;
                data[1] = i;
    
                std::cout << "data[0] = " << static_cast<unsigned>(data[0]) << " ";
                std::cout << "data[1] = " << static_cast<unsigned>(data[1]) << " ";
    
                log << "data[0] = " << static_cast<unsigned>(data[0]) << " ";
                log << "data[1] = " << static_cast<unsigned>(data[1]) << " ";
    
                vlanHeader = reinterpret_cast<vlan_header*>(data);
                std::cout << "PCP: " << std::hex << vlanHeader->PCP << " ";
                std::cout << "DEI: " << std::hex << vlanHeader->DEI << " ";
                std::cout << "ID: " << std::hex << vlanHeader->ID << std::endl;
    
                log << "PCP: " << std::hex << vlanHeader->PCP << " ";
                log << "DEI: " << std::hex << vlanHeader->DEI << " ";
                log << "ID: " << std::hex << vlanHeader->ID << std::endl;
            }   
        }    
        log.close();
    
        delete[] data;
    
    
        std::cout << "\nPress any key and enter to quit." << std::endl;
        char q;
        std::cin >> q;
    
        return 0;
    }
    

    If you look at the patterns it becomes quite evident as to what is happening.


    Let's look at the first few iterations from the generated file that is simplified for here.

    // Values are represented in hex
    // For field member PCP: remember that 3 bits can only hold a max value of 7
    // 8-bits     8-bits     3-bits   1-bit   12-bits
    // data[0]    data[1]    PCP      DEI     ID  
       0x00       0x00       0        0       0
       0x01       0x00       1        0       0
       0x02       0x00       2        0       0
       0x03       0x00       3        0       0
       0x04       0x00       4        0       0
       0x05       0x00       5        0       0
       0x06       0x00       6        0       0
       0x07       0x00       7        0       0   // PCP at max value since 3 bits only has 2^3 digit combinations
       0x08       0x00       0        1       0
       0x09       0x00       1        1       0
       0x0a       0x00       2        1       0
       0x0b       0x00       3        1       0
       0x0c       0x00       4        1       0
       0x0d       0x00       5        1       0
       0x0e       0x00       6        1       0
       0x0f       0x00       7        1       0  // the next iteration is where the bit carries into ID
       0x10       0x00       0        0       1
       // And this pattern repeats through out until ID has max value. 
    

    What is happening in memory with your bitfield is that the 1st byte or 8 bits is consuming both PCP, DEI As Well As the 1st 4 bits of ID and I think this is where you are getting confused. As SoronelHaetir stated in their brief answer is that if you want your 3 bitfields to have values of {0,0,20} in decimal then you need to set your data array to data[0] = 0x40 & data[1] = 0x01 respectively. The bits from data[0] are overflowing into the other bitfield members when that member can no longer contain a high enough value than the amount of bits that is allotted can support.

    What this basically means is that PCP has 3 available bits and its max number of combinatorial bits is 2^3 = 8 so PCP can store values from [0,7]. Since DEI only has 1 bit this acts as a single bit bool flag which can only store values of [0,1] and finally ID has 12 bits available where the 1st 4 are coming from data[0], and the last 8 are all coming from data[1] and this gives you 2^12 = 4096 combinatorial digits that give the values that range from [0,4095] which in hex gives a max value of FFF. This can all be seen in the log or results file.


    I will also show the alignment of your data[] array in parallel with your bitfield

                           First Byte          |       Second Byte 
                 data[0]                       |  data[1]
    data[n]:   ([0][0][0]) ([0])-([0][0][0][0] | [0][0][0][0]-[0][0][0][0])
                                               |
                  PCP       DEI   ID           |
    bitfield:  ([0][0][0]) ([0])-([0][0][0][0] | [0][0][0][0]-[0][0][0][0])
    

    EDIT

    The OP mentioned these statements in a comment to this answer:

    I don't get the "The bits from data[0] are overflowing into the other bitfield members when that member can no longer contain a high enough value", I don't see where the overflow is occurring – Lior Sharon

    and

    also according to your alignment at the bottom of the answer what I did should work because only the second byte of the data1 is used by the bitfield when the ID is 20

    What I'll try to do here is show the bit pattern for data[0] & data[1] with the values 0x40 and 0x01

    Byte 1                       Byte 2
    data[0] = 0x40               data[1] = 0x01
    [0][1][0][0] [0][0][0][0] |  [0][0][0][0] [0][0][0][1] 
    

    This is what the bit patter should look like for data before you cast it to your bitfield struct. Now lets look at the bitfield with all 0s before the cast, then lets look at the values of hex in relation to the bitfield members and what values they can store. I already stated that PCP can store values from [0-7], DEI can store the values [0,1] and ID can store the values from [0-4095] in decimal. You are assigning hex values into the two bytes or 16 bits of memory. You want PCP & DEI to have a value of 0 while ID has a value of 20 in decimal. You are thinking that 0x00 for the first byte will give both PCP & DEI the value of 0 and that 0x14 should give ID the value of 20. That will not work. 0x14 for hex values represents a byte in memory, however ID has 12 bits or 1.5 bytes to store. If you refer to the chart above, member PCP only has 3 bits to store so if we add a value of 7 into data[0] PCP would look like this: [1][1][1] in binary. Without even using data[1]'s byte we can push values into both DEI & ID members.

      Byte1 =                        Byte 2 =            
      ============================|==========================
      PCP       DEI  ID
      0x00                           0x00
      [0][0][0] [0]  [0][0][0][0] | [0][0][0][0] [0][0][0][0]
      0x01                           0x00
      [0][0][1] [0]  [0][0][0][0] | [0][0][0][0] [0][0][0][0]
      0x02                           0x00
      [0][1][0] [0]  [0][0][0][0] | [0][0][0][0] [0][0][0][0]
      0x03                           0x00
      [0][1][1] [0]  [0][0][0][0] | [0][0][0][0] [0][0][0][0]
      0x04                           0x00
      [1][0][0] [0]  [0][0][0][0] | [0][0][0][0] [0][0][0][0]
      0x05                        |  0x00
      [1][0][1] [0]  [0][0][0][0] | [0][0][0][0] [0][0][0][0]
      0x06                           0x00
      [1][1][0] [0]  [0][0][0][0] | [0][0][0][0] [0][0][0][0]
      0x07                           0x00
      [1][1][1] [0]  [0][0][0][0] | [0][0][0][0] [0][0][0][0]
      0x08                           0x00
      [0][0][0] [1]  [0][0][0][0] | [0][0][0][0] [0][0][0][0]
      0x09                           0x00
      [0][0][1] [1]  [0][0][0][0] | [0][0][0][0] [0][0][0][0]
      0x0A                           0x00
      [0][1][0] [1]  [0][0][0][0] | [0][0][0][0] [0][0][0][0]
      0x0B                           0x00
      [0][1][1] [1]  [0][0][0][0] | [0][0][0][0] [0][0][0][0]
      0x0C                           0x00
      [1][0][0] [1]  [0][0][0][0] | [0][0][0][0] [0][0][0][0]
      0x0D                           0x00
      [1][0][1] [1]  [0][0][0][0] | [0][0][0][0] [0][0][0][0]
      0x0E                           0x00
      [1][1][0] [1]  [0][0][0][0] | [0][0][0][0] [0][0][0][0]
      0x0F                           0x00
      [1][1][1] [1]  [0][0][0][0] | [0][0][0][0] [0][0][0][0]
    
      // When we increment the hex value from 0x0F to 0x10 with a decimal value of 16
      // this is where the overflow into the ID member happens and as of right now
      // PCP has a max value of 7 and DEI has a max value of 1 where all bits are full.
      // Watch what happens on the next iteration. Also note that we never gave any values 
      // to data[1] or byte 2 we only gave values to byte 1. This next value will
      // populate a result into the bitfield's member ID.
    
      0x10                          0x00
      [0][0][0] [0] [0][0][0][0] | [0][0][0][0] [0][0][0][1]
    
      // then for the next iteration it'll be like this and so on...
      0x11                          0x00
      [0][0][1] [0] [0][0][0][0] | [0][0][0][0] [0][0][0][1]
      0x12                          0x00
      [0][1][1] [0] [0][0][0][0] | [0][0][0][0] [0][0][0][1]
    
      // while this pattern continues we seen that `0x10` gave us a bit at the right end
      // of member ID so lets look at values 0x20, 0x30 & 0x40 in the first byte
    
      // if 0x10 =                       
      [0][0][0] [0] [0][0][0][0] | [0][0][0][0] [0][0][0][1]
      // then 0x20 should be
      [0][0][0] [0] [0][0][0][0] | [0][0][0][0] [0][0][1][0]
      // and 0x30 should be
      [0][0][0] [0] [0][0][0][0] | [0][0][0][0] [0][0][1][1]
      // finally 0x40 should be 
      [0][0][0] [0] [0][0][0][0] | [0][0][0][0] [0][1][0][0]
      // This is all without touching byte.
    
      // Remember we want both PCP & DEI to have values of 0 but we
      // need a value of 0x16 or 20 in decimal in ID. Because of this
      // overflow of bits due to the nature of bit fields, we can not
      // just set the bytes directly with regular hex values as normal
      // because member PCP only has 3 bits, member DEI has only 1, and
      // the rest belong to ID. In order to get to the value we want
      // we would have to iterate 0x40 all the way up to 0xFF before we would
      // ever use byte 2 making it have a value of 0x01
    
      // Another words:  0xFF  0x00 comes before 0x00  0x01 in this sequence
      // bit patterns, but since we have the value of 0x40 already in the first
      // byte of data[n] giving us a bit pattern of
      [0][0][0] [0] [0][0][0] | [0][0][0][0] [0][1][0][0]
    
      // what does making byte 2 with a value of 0x01 do to this pattern?
    
      // It does this:
      [0][0][0] [0] [0][0][0] | [0][0][0][1] [0][1][0][0]
    
      // Okay so data[0] = 0x40 and data[1] = 0x01 so how does this
      // give us the values of {0,0,20} or {0x00,0x00,0x14} ?
    
      // Let's see from the far left going right the first 3 bits
      // are PCP and all bits are 0 giving it a value of 0
    
      // Next is the single bit for DEI which has a value of 0.
    
      // Finally the next 12 bits are for ID and when we look at this 12 bit
      // pattern we have [0][0][0][0] | [0][0][0][1] [0][1][0][0]
    
      // Let's ignore the left 4 since they are all 0s or padding at this moment
      // So we can see that [0][0][0][1] [0][1][0][0] = 0x14 in hex with a
      // a decimal value of 20.
    

    Now the only problem is: I was doing this in MS Visual Studio 2017 CE on an Intel Quad Core processor running Win7 x64 home premium and I compiled this as an x86 application. Where the bits are actually stored will vary by compiler, OS, and architecture as well. I just showed the pure mathematical bit representation from left to right where most machines will store their bits in a right to left ordering. If you are running on a little endian machine and using a Visual Studio compiler you should get similar results.


    Here are some nicely written article about bitfields; if I come across more I'll post them here: