Search code examples
c++stlclass-design

Most efficient way to add data to an instance


I have a class, let's say Person, which is managed by another class/module, let's say PersonPool.

I have another module in my application, let's say module M, that wants to associate information with a person, in the most efficient way. I considered the following alternatives:

  • Add a data member to Person, which is accessed by the other part of the application. Advantage is that it is probably the fastest way. Disadvantage is that this is quite invasive. Person doesn't need to know anything about this extra data, and if I want to shield this data member from other modules, I need to make it private and make module M a friend, which I don't like.
  • Add a 'generic' property bag to Person, in which other modules can add additional properties. Advantage is that it's not invasive (besides having the property bag), and it's easy to add 'properties' by other modules as well. Disadvantage is that it is much slower than simply getting the value directly from Person.
  • Use a map/hashmap in module M, which maps the Person (pointer, id) to the value we want to store. This looks like the best solution in terms of separation of data, but again is much slower.
  • Give each person a unique number and make sure that no two persons ever get the same number during history (I don't even want to have these persons reuse a number, because then data of an old person may be mixed up with the data of a new person). Then the external module can simply use a vector to map the person's unique number to the specific data. Advantage is that we don't invade the Person class with data it doesn't need to know of (except his unique nubmer), and that we have a quick way of getting the data specifically for module M from the vector. Disadvantage is that the vector may become really big if lots of persons are deleted and created (because we don't want to reuse the unique number).

In the last alternative, the problem could be solved by using a sparse vector, but I don't know if there are very efficient implementations of a sparse vector (faster than a map/hashmap).

Are there other ways of getting this done? Or is there an efficient sparse vector that might solve the memory problem of the last alternative?


Solution

  • The first and third are reasonably common techniques. The second is how dynamic programming languages such as Python and Javascript implement member data for objects, so do not dismiss it out of hand as impossibly slow. The fourth is in the same ballpark as how relational databases work. It is possible, but difficult, to make relational databases run the like the clappers.

    In short, you've described 4 widely used techniques. The only way to rule any of them out is with details specific to your problem (required performance, number of Persons, number of properties, number of modules in your code that will want to do this, etc), and corresponding measurements.

    Another possibility is for module M to define a class which inherits from Person, and adds extra data members. The principle here is that M's idea of a person differs from Person's idea of a person, so describe M's idea as a class. Of course this only works if all other modules operating on the same Person objects are doing so via polymorphism, and furthermore if M can be made responsible for creating the objects (perhaps via dependency injection of a factory). That's quite a big "if". An even bigger one, if nothing other than M needs to do anything life-cycle-ish with the objects, then you may be able to use composition or private inheritance in preference to public inheritance. But none of it is any use if module N is going to create a collection of Persons, and then module M wants to attach extra data to them.