Virtual Function Compared to Pointer Casting

The current version of some code I'm using utilises a slightly odd way of acheiving something which I think could be acheived with polymorphism. More concretely we currently use something like

for(int i=0; i<CObjList.size(); ++i)
{
   CObj* W = CObjList[i];
   if( W->type == someTypeA )
   {
       // do some things which also involve casts such as
       //  ((SomeClassA*) W->objectptr)->someFieldA
   }
   else if( W->type == someTypeB )
   {
       // do some things which also involve casting such as
       //  ((SomeClassB*) W->objectptr)->someFieldB
   }
}

To clarify; each object W contains a void *objectptr; that is to say a pointer to an arbitrary location. The field W->type keeps track of what type of object objectptr points at so that inside our if/else statements we can cast W->objectptr to the correct type and use it's fields.

However, this seems inherently bad from a code design stand point for several reasons;

We have no guarantee that the object pointed to by W->objectptr actually matches what is said in W->type so the cast is inherently unsafe.
Every time we wish to add another type we must add another elseif statement and ensure W->type is set correctly.

It seems to be this would be much better solved with something like

class CObj
{
public:
   virtual void doSomething(/* some params */)=0;
};

class SomeClassA : public CObj
{
public:
   virtual void doSomething(/* some params */);
   int someFieldA;
}

class SomeClassB : public CObj
{
public:
   virtual void doSomething(/* some params */);
   int someFieldB;
}

// sometime later...

for(int i=0; i<CObjList.size(); ++i)
{
   CObj* W = CObjList[i];
   W->doSomething(/* some params */);
}

This having been said there is the proviso that in this setting performace is important. This code will be called from a (relatively) tight loop.

My question is then; is the added complexity of a few vtable lookups outweighed by the improved code design and extensibility and is this likely to affect performace alot?

EDIT: It occurs to me that accessing the fields through a pointer in this way could be as bad as vtable lookups anyway due to cache misses etc. Any thoughts on this?

---- EDIT 2: Also I forgot to mention (and I know it's a bit off the original topic), inside the if statements are many calls to member functions of the surrounding class. How would you design the structure so as to be able to call these from inside doSomething()?

Solution

I'm going to answer specifically on the performance angle, because I work in a perf-critical environment and a while ago I happened to run measurements on a similar case to work out the fastest solution.

If you are on an x86, PPC, or ARM processor, you want virtual functions in this situation. The performance cost of calling a virtual function is mostly the pipeline bubble induced by mispredicting an indirect branch. Because the instruction fetch stage of the CPU can't know where the computed jmp goes, it can't start fetching bytes from the target address until the branch executes, and thus you have a stall in the pipeline corresponding to the number of stages between the first fetch stage and the branch retire. (On the PPC I know best, that's something like 25 cycles.)

You also have the latency of loading the vtable pointer, but this is often hidden by instruction reordering (the compiler moves the load instruction so it starts several cycles before you actually need the result and the CPU does other work while the data cache sends you its electrons.)

With the if-cascade approach you instead have some number n of direct, conditional branches — where the target is known at compile time, but whether the jump is taken is determined at runtime. (ie, a jump-on-equal opcode.) In this case the CPU will make a guess (predict) at whether each branch is taken or not, and start fetching instructions accordingly. So, you will only have a bubble if the CPU guesses wrong. Since you are presumably calling this function with different input each time, it's going to mispredict at least one of these branches, and you'll have the exact same bubble that you would with virtuals. In fact, you'll have a whole lot more bubbles — one per if() conditional!

With virtual functions, there's also the risk of an additional data cache miss on loading the vtable, and an icache miss on the jump target. If this function is in a tight loop, then presumably you'll be looking up and calling the same subroutines a lot, and thus the vtable and function bodies will probably still be in cache. You could measure that if you wanted to be really sure.