Search code examples
csoftware-designmisra

Design of a method and data structures to process various data types


I need a method in C that is able to process a data packet that can be of various formats/types. What would be your recommended approach to do so (note that I want to be MISRA compliant as much as possible). Currently, I am considering the following options (but open to learn additional ones), as per the code oultined below

Option 1: the method explicitly states the format of the data packet (and there is a switch case inside the method) Option 2: the data includes the data type and a union with the content

Which of these options would you prefer?

Many thanks for your insights


int process_type1(const struct type1* data) {

    /* do stuff with data */
}

int process_type2(const struct type2* data) {

    /* do stuff with data */
}


/* Option 1: explicit data type in method to process data */

int process_1(const enum data_type data_type, const uint8_t* data, const size_t* n_bytes) {

    int res = -1;

    switch (data_type) {
        case TYPE_1: res = process_type1((const struct type1*)data); break;
        case TYPE_2: res = process_type2((const struct type2*)data); break;
        default:
            break;
    }

    return res;
}


/* Option 2: implicit data type in method to process data and union to include the data */
struct packet {
    enum data_type {
        TYPE_1,
        TYPE_2,
        /* Add more types as needed */
    } data_type;

    union {
        struct type1 type1;
        struct type2 type2;
        /* Add more types as needed */
    } data;
};

int process_2(const struct packet* packet) {
    int res = -1;

    switch (packet->data_type) {
        case TYPE_1:
            res = process_type1(&packet->data.type1);
            break;
        case TYPE_2:
            res = process_type2(&packet->data.type2);
            break;
        default:
            break;
    }

    return res;
}

Solution

  • Overall, structs and unions are problematic for portable modelling of raw byte streams, such as data protocols:

    • There might be alignment concerns as well as padding. MISRA C's (11.3) main argument against wild and crazy pointer conversions is alignment.
    • There might be a different network endianness.
    • Type punning between raw byte streams and struct/union could cause "strict aliasing" undefined behavior.
    • MISRA C prohibits the use of the same memory area for two unrelated purposes. You may only use union type punning between different representations of the same data, which wouldn't be the case here.

    As for (const struct type1*)data followed by a de-refencing of type1 members, it is a strict aliasing violation and undefined behavior, regardless of MISRA. It doesn't matter if uint8_t is a character type because the character type exception to the strict aliasing rules is only valid when going from a larger type to a character type, not the other way around. (Unless you use another trick which I will show further down.)

    The only truly portable, well-defined, MISRA-compliant way is to write serialization/deserialization routines that copy the data member by member. Which you will have to do anyway in case facing a different network endianness compared to your CPU endianness. The down side to this is that you have to copy the data.


    Otherwise, this is a possible work-around:

    typedef union
    {
      struct
      {
        // specific members
      };
      uint8_t raw_data [expected_size];
    } type1;
    

    This assuming uint8_t is a character type (pretty much dead certain in practice). After determining the type of the packet, have the function take a uint8_t* parameter. We may now cast that parameter into a type1* because we now have "a union type with a member compatible with the effective type of the object" (uint8_t). That's another exception to the strict aliasing rule.

    Furthermore we no longer type pun between two unrelated structs, but rather between the incoming raw data and one of the possible struct types.

    The anonymous struct style above is optional, but it allows us to type type.member rather than type.some_struct.member. MISRA C does not object against anonymous struct/union.

    We may now cook up a generic API like:

    typedef enum {
      TYPE_1,
      TYPE_2,
      /* ... */
      TYPE_N
    } data_t;
    
    typedef result_t process_func_t (const uint8_t* raw_data);
    
    result_t process_type1 (const uint8_t* raw_data); // type-specific protocol parser
    ...
    
    process_func_t* const process[] =  // branch table
    {
      [TYPE1] = process_type1,
      [TYPE2] = process_type2,
      ...
    };
    
    // ensure integrity between enum and branch table:
    _Static_assert(sizeof(process)/sizeof(*process) == TYPE_N,
                   "process look-up table inconsistent");
    

    Usage:

    data_t data_type;
    ...
    if(data_type < TYPE_1 || data_type >= TYPE_N)
    {
      /* defensive programming, error handling here */
    }
    
    // assuming raw_data is aligned as per the requirements of the various struct types, then:
    process[data_type](raw_data);
    

    This is pretty standard code for safety-related software. And as far as I can tell, MISRA compliant, apart from the cast from uint8_t* to type1* etc that we put inside process_type1. A MISRA deviation from rule 11.3 might be necessary. We don't want a permanent deviation against that rule, because it's a very sound one.