After profiling, I found that a large portion of memory of my program are wasted by multi-virtual-inheritance.
This is MCVE to demostrate the problem ( http://coliru.stacked-crooked.com/a/0509965bea19f8d9 )
#include<iostream>
class Base{
public: int id=0;
};
class B : public virtual Base{
public: int fieldB=0;
public: void bFunction(){
//do something about "fieldB"
}
};
class C : public virtual B{
public: int fieldC=0;
public: void cFunction(){
//do something about "fieldC"
}
};
class D : public virtual B{
public: int fieldD=0;
};
class E : public virtual C, public virtual D{};
int main (){
std::cout<<"Base="<<sizeof(Base)<<std::endl; //4
std::cout<<"B="<<sizeof(B)<<std::endl; //16
std::cout<<"C="<<sizeof(C)<<std::endl; //32
std::cout<<"D="<<sizeof(D)<<std::endl; //32
std::cout<<"E="<<sizeof(E)<<std::endl; //56
}
I hope sizeof(E)
to be not much more than 16 bytes (id
+fieldB
+fieldC
+fieldD
).
From experiment, if it is non virtual inheritance, E
's size will be 24 (MCVE).
How to reduce size of E
(by C++ magic, change program architecture, or design pattern)?
Requirement:-
Base,B,C,D,E
can't be template class. It will cause circular dependency for me. e->bFunction()
and e->cFunction()
, as usual.e->bField
anymore. "E inherit from C and D"
as class E : public virtual C, public virtual D
easily. I am thinking about CRTP e.g. class E: public SomeTool<E,C,D>{}
, but not sure how to make it works.
To make things easier :
static_cast<C*>(E*)
or vise versa. E
virtual inherit from C & D
, etc, all above requirement will be met, but I will still waste a lot of memory. :- I am using C++17.
Here is a more correct description of my real-life problem.
I create a game that has many components e.g. B C D E
.
All of them are created via pool. Thus, it enables fast iterating.
Currently, if I query every E
from a game engine, I will be able to call e->bFunction()
.
In my most severe case, I waste 104 bytes per object in E
-like class. (real hierarchy is more complex)
Let me try again. Here is a more meaningful class diagram.
I have a central system to assign hpPtr
,flyPtr
,entityId
,componentId
,typeId
automatically already.
i.e. Don't worry how they are initialized.
In real case, dread diamond happen in many classes, this is the most simple case.
Currently, I call like :-
auto hps = getAllComponent<HpOO>();
for(auto ele: hps){ ele->damage(); }
auto birds = getAllComponent<BirdOO>();
for(auto ele: birds ){
if(ele->someFunction()){
ele->suicidalFly();
//.... some heavy AI algorithm, etc
}
}
With this approach, I can enjoy cache coherence as in Entity Component System, and the cool ctrl+space
intellisense of HpOO
,FlyableOO
and BirdOO
like in Object-Oriented style.
Everything works fine - it just uses too much memory.
EDIT: based on latest update to the question and some chatting
Here's the most compact maintaining the virtual in all your classes.
#include <iostream>
#include <vector>
using namespace std;
struct BaseFields {
int entityId{};
int16_t componentId{};
int8_t typeId{};
int16_t hpIdx;
int16_t flyPowerIdx;
};
vector<int> hp; // this will contain all the hit points, dynamically resizable, logic up to you
vector<float> flyPower; // this will contain all the fly powers, dynamically resizable, logic up to you
class BaseComponent {
public: // or protected
BaseFields data;
};
class HpOO : public virtual BaseComponent {
public:
void damage() {
hp[data.hpIdx] -= 1;
}
};
class FlyableOO : public virtual BaseComponent {
public:
void addFlyPower(float power) {
flyPower[data.hpIdx] += power;
}
};
class BirdOO : public virtual HpOO, public virtual FlyableOO {
public:
void suicidalFly() {
damage();
addFlyPower(5);
}
};
int main (){
std::cout<<"Base="<<sizeof(BaseComponent)<<std::endl; // 12
std::cout<<"C="<<sizeof(HpOO)<<std::endl; // 24
std::cout<<"D="<<sizeof(FlyableOO)<<std::endl; // 24
std::cout<<"E="<<sizeof(BirdOO)<<std::endl; // 32
}
much smaller class size version dropping all the virtual class stuff:
#include <iostream>
#include <vector>
using namespace std;
struct BaseFields {
};
vector<int> hp; // this will contain all the hit points, dynamically resizable, logic up to you
vector<float> flyPower; // this will contain all the fly powers, dynamically resizable, logic up to you
class BaseComponent {
public: // or protected
int entityId{};
int16_t componentId{};
int8_t typeId{};
int16_t hpIdx;
int16_t flyPowerIdx;
protected:
void damage() {
hp[hpIdx] -= 1;
};
void addFlyPower(float power) {
flyPower[hpIdx] += power;
}
void suicidalFly() {
damage();
addFlyPower(5);
};
};
class HpOO : public BaseComponent {
public:
using BaseComponent::damage;
};
class FlyableOO : public BaseComponent {
public:
using BaseComponent::addFlyPower;
};
class BirdOO : public BaseComponent {
public:
using BaseComponent::damage;
using BaseComponent::addFlyPower;
using BaseComponent::suicidalFly;
};
int main (){
std::cout<<"Base="<<sizeof(BaseComponent)<<std::endl; // 12
std::cout<<"C="<<sizeof(HpOO)<<std::endl; // 12
std::cout<<"D="<<sizeof(FlyableOO)<<std::endl; // 12
std::cout<<"E="<<sizeof(BirdOO)<<std::endl; // 12
// accessing example
constexpr int8_t BirdTypeId = 5;
BaseComponent x;
if( x.typeId == BirdTypeId ) {
auto y = reinterpret_cast<BirdOO *>(&x);
y->suicidalFly();
}
}
this example assumes your derived classes do not have overlapping functionalities with diverging effects, if you have those you have to add virtual functions to your base class for an extra overhead of 12 bytes (or 8 if you pack the class).
and quite possibly the smallest version still maintaining the virtuals
#include <iostream>
#include <vector>
using namespace std;
struct BaseFields {
int entityId{};
int16_t componentId{};
int8_t typeId{};
int16_t hpIdx;
int16_t flyPowerIdx;
};
#define PACKED [[gnu::packed]]
vector<int> hp; // this will contain all the hit points, dynamically resizable, logic up to you
vector<float> flyPower; // this will contain all the fly powers, dynamically resizable, logic up to you
vector<BaseFields> baseFields;
class PACKED BaseComponent {
public: // or protected
int16_t baseFieldIdx{};
};
class PACKED HpOO : public virtual BaseComponent {
public:
void damage() {
hp[baseFields[baseFieldIdx].hpIdx] -= 1;
}
};
class PACKED FlyableOO : public virtual BaseComponent {
public:
void addFlyPower(float power) {
flyPower[baseFields[baseFieldIdx].hpIdx] += power;
}
};
class PACKED BirdOO : public virtual HpOO, public virtual FlyableOO {
public:
void suicidalFly() {
damage();
addFlyPower(5);
}
};
int main (){
std::cout<<"Base="<<sizeof(BaseComponent)<<std::endl; // 2
std::cout<<"C="<<sizeof(HpOO)<<std::endl; // 16 or 10
std::cout<<"D="<<sizeof(FlyableOO)<<std::endl; // 16 or 10
std::cout<<"E="<<sizeof(BirdOO)<<std::endl; // 24 or 18
}
the first number is for unpacked structure, second packed
You can also pack the hpIdx and flyPowerIdx into the entityId using the union trick:
union {
int32_t entityId{};
struct {
int16_t hpIdx;
int16_t flyPowerIdx;
};
};
in the above example if not using packing and moving the whole BaseFields
structure into the BaseComponent
class the sizes remain the same.
END EDIT
Virtual inheritance just adds one pointer size to the class, plus alignment of the pointer (if needed). You can't get around that if you actually need a virtual class.
The question you should be asking yourself is whether you actually need it. Depending on your access methods to this data that might not be the case.
Considering you need virtual inheritance but all common methods that need to be callable from all your classes you can have a virtual base class and use a bit less space than your original design in the following way:
class Base{
public: int id=0;
virtual ~Base();
// virtual void Function();
};
class B : public Base{
public: int fieldB=0;
// void Function() override;
};
class C : public B{
public: int fieldC=0;
};
class D : public B{
public: int fieldD=0;
};
class E : public C, public D{
};
int main (){
std::cout<<"Base="<<sizeof(Base)<<std::endl; //16
std::cout<<"B="<<sizeof(B)<<std::endl; // 16
std::cout<<"C="<<sizeof(C)<<std::endl; // 24
std::cout<<"D="<<sizeof(D)<<std::endl; // 24
std::cout<<"E="<<sizeof(E)<<std::endl; // 48
}
In the case that there are cache misses but the CPU still has power to process the results you can furter decrease the size by using compiler-specific instructions to make the data structure as small as possible (next example works in gcc):
#include<iostream>
class [[gnu::packed]] Base {
public:
int id=0;
virtual ~Base();
virtual void bFunction() { /* do nothing */ };
virtual void cFunction() { /* do nothing */ }
};
class [[gnu::packed]] B : public Base{
public: int fieldB=0;
void bFunction() override { /* implementation */ }
};
class [[gnu::packed]] C : public B{
public: int fieldC=0;
void cFunction() override { /* implementation */ }
};
class [[gnu::packed]] D : public B{
public: int fieldD=0;
};
class [[gnu::packed]] E : public C, public D{
};
int main (){
std::cout<<"Base="<<sizeof(Base)<<std::endl; // 12
std::cout<<"B="<<sizeof(B)<<std::endl; // 16
std::cout<<"C="<<sizeof(C)<<std::endl; // 20
std::cout<<"D="<<sizeof(D)<<std::endl; // 20
std::cout<<"E="<<sizeof(E)<<std::endl; //40
}
saving an additional 8 bytes at the price of possibly some CPU overhead (but if memory is the issue might help).
Additionally if there is really a single function you are calling for each of your classes you should only have that as a single function which you override whenever necessary.
#include<iostream>
class [[gnu::packed]] Base {
public:
virtual ~Base();
virtual void specificFunction() { /* implementation for Base class */ };
int id=0;
};
class [[gnu::packed]] B : public Base{
public:
void specificFunction() override { /* implementation for B class */ }
int fieldB=0;
};
class [[gnu::packed]] C : public B{
public:
void specificFunction() override { /* implementation for C class */ }
int fieldC=0;
};
class [[gnu::packed]] D : public B{
public:
void specificFunction() override { /* implementation for D class */ }
int fieldD=0;
};
class [[gnu::packed]] E : public C, public D{
void specificFunction() override {
// implementation for E class, example:
C::specificFunction();
D::specificFunction();
}
};
This would also allow you to avoid having to figure out what class which object is before calling the appropriate function.
Furthermore, assuming your original virtual class inheritance idea is what works best for your application you could restructure your data so that it's more easily accessible for caching purposes while also decreasing the size of your classes and having your functions accessible at the same time:
#include <iostream>
#include <array>
using namespace std;
struct BaseFields {
int id{0};
};
struct BFields {
int fieldB;
};
struct CFields {
int fieldB;
};
struct DFields {
int fieldB;
};
array<BaseFields, 1024> baseData;
array<BaseFields, 1024> bData;
array<BaseFields, 1024> cData;
array<BaseFields, 1024> dData;
struct indexes {
uint16_t baseIndex; // index where data for Base class is stored in baseData array
uint16_t bIndex; // index where data for B class is stored in bData array
uint16_t cIndex;
uint16_t dIndex;
};
class Base{
indexes data;
};
class B : public virtual Base{
public: void bFunction(){
//do something about "fieldB"
}
};
class C : public virtual B{
public: void cFunction(){
//do something about "fieldC"
}
};
class D : public virtual B{
};
class E : public virtual C, public virtual D{};
int main (){
std::cout<<"Base="<<sizeof(Base)<<std::endl; // 8
std::cout<<"B="<<sizeof(B)<<std::endl; // 16
std::cout<<"C="<<sizeof(C)<<std::endl; // 16
std::cout<<"D="<<sizeof(D)<<std::endl; // 16
std::cout<<"E="<<sizeof(E)<<std::endl; // 24
}
Obviously this is just an example and it assumes you don't have more than 1024 objects at a point, you can increase that number but above 65536 you'd have to use a bigger int to store them, also below 256 you can use uint8_t to store the indexes.
Furthermore if one of the structures above adds very little overhead to it's parent you could reduce the number of arrays you use to store the data, if there's very little difference in the size of objects you can just store all the data in a single structure and have more localized memory accesses. That all depends on your application so I can't give more advice here other than to benchmark what works best for your case.
Have fun and enjoy C++.