Using pointers, references, handles to generic datatypes, as generic and flexible as possible

In my application I have lots of different data types, e.g. Car, Bicycle, Person, ... (they're actually other data types, but this is just for the example).

Since I also have quite some 'generic' code in my application, and the application was originally written in C, pointers to Car, Bicycle, Person, ... are often passed as void-pointers to these generic modules, together with an identification of the type, like this:

Car myCar;
ShowNiceDialog ((void *)&myCar, DATATYPE_CAR);

The 'ShowNiceDialog' method now uses meta-information (functions that map DATATYPE_CAR to interfaces to get the actual data out of Car) to get information of the car, based on the given data type. That way, the generic logic only has to be written once, and not every time again for every new data type.

Of course, in C++ you could make this much easier by using a common root class, like this

class RootClass
   {
   public:
      string getName() const = 0;
   };

class Car : public RootClass
   {
   ...
   };

void ShowNiceDialog (RootClass *root);

The problem is that in some cases, we don't want to store the data type in a class, but in a totally different format to save memory. In some cases we have hundreds of millions of instances that we need to manage in the application, and we don't want to make a full class for every instance. Suppose we have a data type with 2 characteristics:

A quantity (double, 8 bytes)
A boolean (1 byte)

Although we only need 9 bytes to store this information, putting it in a class means that we need at least 16 bytes (because of the padding), and with the v-pointer we possibly even need 24 bytes. For hundreds of millions of instances, every byte counts (I have a 64-bit variant of the application and in some cases it needs 6 GB of memory).

The void-pointer approach has the advantage that we can almost encode anything in a void-pointer and decide how to use it if we want information from it (use it as a real pointer, as an index, ...), but at the cost of type-safety.

Templated solutions don't help since the generic logic forms quite a big part of the application, and we don't want to templatize all this. Additionally, the data model can be extended at run time, which also means that templates won't help.

Are there better (and type-safer) ways to handle this than a void-pointer? Any references to frameworks, whitepapers, research material regarding this?

Solution

If you don't want a full class, you should read up on FlyWeight pattern. It's designed to save up memory.

EDIT: sorry, lunch-time pause ;)

The typical FlyWeight approach is to separate properties that are common to a great number of objects from properties that are typical of a given instance.

Generally, it means:

struct Light
{
  kind_type mKind;
  specific1 m1;
  specific2 m2;
};

The kind_type is often a pointer, however it is not necessary. In your case it would be a real waste because the pointer itself would be 4 times as big as the "useful" information.

Here I think we could exploit padding to store the id. After all, as you said it's going to be expanded to 16 bits even though we only use 9 of them, so let's not waste the other 7!

struct Object
{
  double quantity;
  bool flag;
  unsigned char const id;
};

Note that the order of elements is important:

0x00    0x01    0x02    0x03
[      ][      ][      ][      ]
   quantity       flag     id

0x00    0x01    0x02    0x03
[      ][      ][      ][      ]
   id     flag     quantity

0x00            0x02            0x04
[      ][      ][      ][      ][      ][      ]
   id     --        quantity      flag     --

I don't understand the "extended at runtime" bit. Seems scary. Is this some sort of self-modifying code ?

Template allow to create a very interesting form of FlyWeight: Boost.Variant.

typedef boost::variant<Car,Dog,Cycle, ...> types_t;

The variant can hold any of the types cited here. It can be manipulated by "normal" functions:

void doSomething(types_t const& t);

Can be stored in containers:

typedef std::vector<types_t> vector_t;

And finally, the way to operate over it:

struct DoSomething: boost::static_visitor<>
{
  void operator()(Dog const& dog) const;

  void operator()(Car const& car) const;
  void operator()(Cycle const& cycle) const;
  void operator()(GenericVehicle const& vehicle) const;

  template <class T>
  void operator()(T const&) {}
};

It's very interesting to note the behavior here. Normal function overload resolution occurs, therefore:

If you have a Car or a Cycle you'll use those, every other child of GenericVehicle will us the 4th version
It's possible to specify a template version as a catch them all, and specify it appropriately.

I shall note that non-template methods can perfectly be defined in a .cpp file.

In order to apply this visitor, you use the boost::apply_visitor method:

types_t t;
boost::apply_visitor(DoSomething(), t);

// or

boost::apply_visitor(DoSomething())(t);

The second way seems odd, but it means you can use it in a most interesting fashion, as predicate:

vector_t vec = /**/;
std::foreach(vec.begin(), vec.end(), boost::apply_visitor(DoSomething()));

Read up on variant, it's most interesting.

Compile time check: you missed one operator() ? the compiler throws up
No necessity of RTTI: no virtual pointer, no dynamic type --> as fast as using a union, but with increased safety

You can of course segment your code, by defining multiple variants. If some sections of the code only deal with 4/5 types, then use a specific variant for it :)