c++c optimization portability maintainability

How to synchronize C & C++ libraries with minimal performance penalty?

I have a C library with numerous math routines for dealing with vectors, matrices, quaternions and so on. It needs to remain in C because I often use it for embedded work and as a Lua extension. In addition, I have C++ class wrappers to allow for more convenient object management and operator overloading for math operations using the C API. The wrapper only consists of a header file and as much use on inlining is made as possible.

Is there an appreciable penalty for wrapping the C code versus porting and inlining the implementation directly into the C++ class? This library is used in time critical applications. So, does the boost from eliminating indirection compensate for the maintenance headache of two ports?

Example of C interface:

typedef float VECTOR3[3];

void v3_add(VECTOR3 *out, VECTOR3 lhs, VECTOR3 rhs);

Example of C++ wrapper:

class Vector3
{
private:
    VECTOR3 v_;

public:
    // copy constructors, etc...

    Vector3& operator+=(const Vector3& rhs)
    {
        v3_add(&this->v_, this->v_, const_cast<VECTOR3> (rhs.v_));
        return *this;
    }

    Vector3 operator+(const Vector3& rhs) const
    {
        Vector3 tmp(*this);
        tmp += rhs;
        return tmp;
    }

    // more methods...
};

Solution

Your wrapper itself will be inlined, however, your method calls to the C library typically will not. (This would require link-time-optimizations which are technically possible, but to AFAIK rudimentary at best in todays tools)

Generally, a function call as such is not very expensive. The cycle cost has decreased considerably over the last years, and it can be predicted easily, so the the call penalty as such is negligible.

However, inlining opens the door to more optimizations: if you have v = a + b + c, your wrapper class forces the generation of stack variables, whereas for inlined calls, the majority of the data can be kept in the FPU stack. Also, inlined code allows simplifying instructions, considering constant values, and more.

So while the measure before you invest rule holds true, I would expect some room for improvements here.

A typical solution is to bring the C implementaiton into a format that it can be used either as inline functions or as "C" body:

// V3impl.inl
void V3DECL v3_add(VECTOR3 *out, VECTOR3 lhs, VECTOR3 rhs)
{
    // here you maintain the actual implementations
    // ...
}

// C header
#define V3DECL 
void V3DECL v3_add(VECTOR3 *out, VECTOR3 lhs, VECTOR3 rhs);

// C body
#include "V3impl.inl"


// CPP Header
#define V3DECL inline
namespace v3core {
  #include "V3impl.inl"
} // namespace

class Vector3D { ... }

This likely makes sense only for selected methods with comparedly simple bodies. I'd move the methods to a separate namespace for the C++ implementation, as you will usually not need them directly.

(Note that the inline is just a compiler hint, it doesn't force the method to be inlined. But that's good: if the code size of an inner loop exceeds the instruction cache, inlining easily hurts performance)

Whether the pass/return-by-reference can be resolved depends on the strength of your compiler, I've seen many where foo(X * out) forces stack variables, whereas X foo() does keep values in registers.