Polymorphism or dictionary of properties?

I need to parse an XML and create objects in C++ that correspond to the XML elements, and also pack the attributes of the XML elements into those objects as properties. The XML elements/attributes and the corresponding C++ classes/properties inherit from a base object that has properties common with the derived objects.

I was thinking to use a base class for the common properties and derived classes would only need to define the object-specific properties. However, someone told me that using the standard polymorphism is not a good idea here because the solution is not generic enough – adding/changing attributes in the XML would require adding/changing the corresponding properties to the C++ classes, which requires changing too much code. A much better and more abstract solution would be to put a dictionary of attributes in the base class and access individual properties by searching the dictionary. I wanted to ask the community whether this suggestion is better for production code.

Below is an example, where vehicle is the base object, and different vehicle types inheriting from it. Vehicle object has the name and weight properties that are common for all vehicles. Here is a sample XML fragment:

<car name="Toyota" weight="3500" passengers="4" />
<boat name="Yamaha" weight="3700" draft="16" />

Base class:

class vehicle {
    public:
    string name;
    string weight;
};

Derived classes:

class car : public vehicle {
    public:
    string passengers;
};

class boat : public vehicle {
    public:
    string draft;
};

Now, when my parser finds the car and boat elements I instantiate:

boat *b = new boat ();
car *c = new car ();

For the boat I can access all members simply as b->name, b->weight, and b->draft. After the C++ objects are instantiated from the XML, we do the whole bunch of object-specific work on each property, so the goal is not to just simply load XML into the program.

The alternative approach:

#define TAG_XML_ATTRIBUTE_NAME              "name"
#define TAG_XML_ATTRIBUTE_WEIGHT            "weight"
#define TAG_XML_ATTRIBUTE_PASSENGERS        "passengers"
#define TAG_XML_ATTRIBUTE_DRAFT             "draft"
. . .

class vehicle {
    public:
    map<string, string> arguments;

    // Get argument by name searching the arguments dictionary.
    string GetArgumentOrEmpty(const string& argumentName);
};

class car : public vehicle {
    public: // All propeties are in the base class dictionary.
};

class boat : public vehicle {
    public: // All propeties are in the base class dictionary.
};

All derived classes have the arguments map; in order to get a property value we scan the dictionary to find the property by name and then read its value, e.g. GetArgumentOrEmpty(TAG_XML_ATTRIBUTE_WEIGHT). CPUs are fast these days, so search the dictionary will not noticeable affect performance. The namespace of elements/attributes goes from structured (as class members) to flat (as #define list), but they will all be in one place. The code is more abstract; although the object processing code gets more complex due to additional indirection, it does not need to change if the attribute name changes; e.g. if passengers in the XML changes to people, the #define will look a little awkward, but the code that uses it won’t have to change:

#define TAG_XML_ATTRIBUTE_PASSENGERS        "people"

Does the dictionary-based property access method present enough benefit to use it for production code instead of using regular class properties? If so, is it only for this specific XML-parsing case or replacing polymorphism with dictionary search and substituting class properties with string names is a better idea in general?

Solution

The dictionary based approach has indeed a couple of advantages:

It's easy to add new new properties to a class/object.
The object loading/saving/creation code would be quite generic.
It's possible to make the properties that belong to classes fully configurable
In fact some game coding experts such as Mike McShaffry ("Game Coding Complete") advocate the use of such kind of "Component" architecture for representing game actors, rather than deep inheritance hierarchies that are more difficult to maintain and enrich.

There are a couple of inconveniences as well:

The class specific processing code would be cluttered with getters
This code would depend heavily on the property naming, which makes configurability less easy.
It would be easy to get bugs into the system due to unnoticed typos: the compiler would not spot these inconsistencies, as it's just litteral data.

So you have to make a difficult choice. My advice would be:

if you have complex class dependent processing code, then it would be les error prone to stick to your original approach. The benefit of the compiler knowing the member variables you're using would be a guarant for reliability.
if you have only a couple of class dependent routines, and if the properties are mostly descriptive and handled in similar manner, then it could make sense to take the map approach
if you need an easily extensible and configurable object structure, the second approach would also be the more flexible approach.

If you go for the map, a couple of thoughts:

you could think of a dictionary between an external property name and an internal integer equivalent to represent it. This would accelerate the map by avoiding lots of string processing overhead.
of you could envisage using unordered_map to avoid too much string comparisons.