Search code examples
c++polymorphismnew-operatorstdvector

Achieve polymorphism with contiguous memory


I'm not facing a "problem" actually, as my code does work. I'm just curious about whether my implementations are reasonable and riskless.

I've been working on a project using C++, in which I first parse a file and then build a directed-acyclic-graph structure accordingly. Each nodes may have 0~2 out-neighbors depending on the type of the node. For different types of nodes, some functions for printing and accessing are needed, and I decided to do it using polymorphism.

My first trial was to implement it with nodes storing pointers to its out-neighbors.

class Base{
public:
  Base(){}
  virtual ~Base(){}
  virtual foo()=0;
  // ...
protected:
  unsigned _index;
}

class Derived1: public Base{
public:
  foo(){ /*Do something here...*/ }
private:
  Base* _out1;
}

class Derived2: public Base{
public:
  foo(){ /*Do something different here...*/ }
private:
  Base* _out1;
  Base* _out2;
}

int main(){
  std::vector<Base*> _nodeList;
  for(/*during parsing*/){
    if(someCondition){
      _nodeList.pushback(new Derived1);
    }
    // ...
  }
}

Since the out-neighbor of a node may be yet to define when the node is constructed, I have to add some tricks to first remember id of the out-neighbors and connect them after finishing the construction of all nodes.

However, since the number of nodes are determined given the file to parse and will not grow ever after, I consider it better to store all nodes contiguously and each node store the indices of its out-neighbors instead of pointers. This allows me to skip the connection part and also brings some minor benefits to other parts.

My current version is as follows:

// Something like this
class Base{
public:
  Base(){}
  virtual ~Base(){}
  virtual foo()=0;
  // ...
protected:
  unsigned _index;
  unsigned _out1;
  unsigned _out2;
}

class Derived1: public Base{
public:
  foo(){ /*Do something here...*/ }
}

class Derived2: public Base{
public:
  foo(){ /*Do something a little bit different here...*/ }
}

int main(){
  // EDITED!!
  // Base* _nodeList = new DefaultNode[max_num];
  Base* _nodeList = new Derived2[max_num];
  for(/*during parsing*/){
    if(someCondition){
      // EDITED!!
      // _nodeList[i] = Derived1;
      new(_nodeList+i) Derived1();
    }
    // ...
  }
}

My questions

  1. Are there any risks to store objects of different class in a newed array, given that they are all of the same size and can be destructed using a virtual destructor?

  2. I've always heard that the use of new[] should be avoided. I did found some approaches that achieve what I want using vector of union with a type tag, but it seems somewhat dirty to me. Is there a way to achieve polymorphism while storing data in a std::vector?

  3. Is the practice of using polymorphism merely to make use of the convenience of virtual functions consider a bad habit? By saying so I mean if the memory taken by each object is already the same for each derived class, then they may be merged into one single class that store its own type, and each member function can just behave according to its own type. I chose not to do so since it also looks dirty to me to have huge switch structure in each member function.

  4. Is it good to choose contiguous memory in this case? Are there any reasons that such choice may be harmful?

EDIT:

It turns out that I'm making many mistakes such as asking too many questions at a time. I think I'll first focus on the part with polymorphism and placement new. The following is a testable program of what I mean by "storing objects of different derived classes in an newed array, and it behaves on my laptop as shown below.

#include <iostream>

class Base{
public:
  Base(){}
  virtual ~Base(){}
  void virtual printType() =0;
};

class Derived1: public Base{
public:
  Derived1(){}
  void printType(){ std::cout << "Derived 1." << std::endl; }
};

class Derived2: public Base{
public:
  Derived2(){}
  void printType(){ std::cout << "Derived 2." << std::endl; }
};

int main(){
  Base* p = new Derived1[5];
  new(p+2) Derived2();
  for(unsigned i = 0; i < 5; ++i){
    (p+i)->printType();
  }
}

Outcome:

Derived 1.
Derived 1.
Derived 2.
Derived 1.
Derived 1.

Again, thanks for all the feedbacks and suggestions.


Solution

  • The problem is that normally you cannot allocate different polymorphic classes inside a vector or array - only pointers to them. So you cannot make it contiguous.

    In your case usage of polymorphism is most probably a bad idea. It will result in poor memory fragmentation and slow performance due to issues with lots of virtual calls and fails on the branch prediction. Though, if there aren't many nodes or you don't use it too frequently in your code - then it won't affect the overall performance of the program.

    To avoid this, simply store nodes' data (and make it a plain struct) inside a vector and utilize separate classes that implement those foo() functions.

    Example:

    std::vector<NodeData> nodes;
    class Method1
    {
       public:
       static void Process(NodeData& node);
       ...
    }
    
    class Method2
    {
       public:
       static void Process(NodeData& node);
       ...
    }
    

    Then you can either make a single switch to choose which method to apply or, say, store nodes' data inside several vectors so that each vector identifies which method to use.