Search code examples
c++architectureencapsulationdata-oriented-design

Should types have methods in data oriented design?


Currently, my application consists of three types of classes. It should follow a data oriented design, please correct me if it's not. Those are the three types of classes. The code examples are not that important, you can skip them if you want. They are just there to give an impression. My question is, should I add methods to my type classes?

Current design

Types are just holding values.

struct Person {
    Person() : Walking(false), Jumping(false) {}
    float Height, Mass;
    bool Walking, Jumping;
};

Modules implement one distinctive functionality each. They can access all types since those are stored globally.

class Renderer : public Module {
public:
    void Init() {
        // init opengl and glew
        // ...
    }
    void Update() {
        // fetch all instances of one type
        unordered_map<uint64_t, *Model> models = Entity->Get<Model>();
        for (auto i : models) {
            uint64_t id = i.first;
            Model *model = i.second;
            // fetch single instance by id
            Transform *transform = Entity->Get<Transform>(id);
            // transform model and draw
            // ...
        }
    }
private:
    float time;
};

Managers are kind of helpers that get injected into modules via the base Module class. The above used Entity is an instance of a entity manager. Other managers cover messaging, file access, sql storage, and so on. In short, every functionality that should be shared among modules.

class ManagerEntity {
public:
    uint64_t New() {
        // generate and return new id
        // ...
    }
    template <typename T>
    void Add(uint64_t Id) {
        // attach new property to given id
        // ...
    }
    template <typename T>
    T* Get(uint64_t Id) {
        // return property attached to id
        // ...
    }
    template <typename T>
    std::unordered_map<uint64_t, T*> Get() {
        // return unordered map of all instances of that type
        // ...
    }
};

Problem with it

Now you've got an idea of my current design. Now consider the case that a type needs more complicated initialization. For example the Model type just stored OpenGL ids for its textures and vertex buffers. The actual data must be uploaded to the video card before.

struct Model {
    // vertex buffers
    GLuint Positions, Normals, Texcoords, Elements;
    // textures
    GLuint Diffuse, Normal, Specular;
    // further material properties
    GLfloat Shininess;
};

Currently, there is a Models module with a Create() function, that takes care of setting up a model. But this way, I can only create models from this module, not from others. Should I move this to the type class Model while complexifying it? I though of the type definitions just as an interface before.


Solution

  • First, you don't necessarily need to apply data-oriented design everywhere. It's ultimately an optimization, and even a performance-critical codebase still has a whole lot of parts which don't benefit from it.

    I tend to often think of it as obliterating structure in favor of big blocks of data that's more efficient to process. Take an image, for example. To efficiently represent its pixels generally requires storing a simple array of numeric values, not, say, a collection of user-defined abstract pixel objects which have a virtual pointer as an exaggerated example.

    Imagine a 4-component (RGBA) 32-bit image using floats but using only 8-bit alpha for whatever reason (sorry, it's kind of a goofy example). If we even used a basic struct for a pixel type, we would normally end up requiring considerably more memory using a pixel struct due to structure padding required for alignment.

    struct Image
    {
        struct Pixel
        {
            float r;
            float g;
            float b;
            unsigned char alpha;
            // some padding (3 bytes, e.g., assuming 32-bit alignment
            // for floats and 8-bit alignment for unsigned char)
        };
        vector<Pixel> Pixels;
    };
    

    Even in this simple case, turning it into a flat array of floats with a parallel array of 8-bit alphas reduces the memory size and potentially improves sequential access speed as a result.

    struct Image
    {
        vector<float> rgb;
        vector<unsigned char> alpha;
    };
    

    ... and that's how we should be thinking initially: about data, memory layouts. Of course, images are already typically represented efficiently, and image processing algorithms are already implemented to process a large number of pixels in bulk.

    Yet data-oriented design takes this to a further level than usual by applying this kind of representation even to things that are considerably higher-level than a pixel. In a similar way, you might benefit from modeling a ParticleSystem instead of a single Particle to leave such breathing room for optimizations, or even People instead of Person.

    But let's come back to the image example. This would tend to imply a lack of DOD:

    struct Image
    {
        struct Pixel
        {
            // Adjust the brightness of this pixel.
            void adjust_brightness(float amount);
    
            float r;
            float g;
            float b;
        };
        vector<Pixel> Pixels;
    };
    

    The problem with this adjust_brightness method is that it is designed, from an interface standpoint, to work on a single pixel. This can make it difficult to apply optimizations and algorithms which benefit from having access to multiple pixels at once. Meanwhile, something like this:

    struct Image
    {
        vector<float> rgb;
    };
    void adjust_brightness(Image& img, float amount);
    

    ... can be written in a way that benefits from accessing multiple pixels at once. We might even represent it like this with an SoA rep:

    struct Image
    {
        vector<float> r;
        vector<float> g;
        vector<float> b;
    };
    

    ... which might be optimal if your hotspots relate to sequential processing. The details don't matter so much. To me what's important is that your design leaves breathing room to optimize. The value to me of DOD is actually how putting that type of thought upfront will give you these types of interface designs which leave you breathing room to optimize later as needed without intrusive design changes.

    Polymorphism

    The classic example of polymorphism tends to also focus on that granular one-thing-at-a-time mindset, like Dog inherits Mammal. In games that can sometimes lead to bottlenecks where the developers start having to fight against the type system, sorting polymorphic base pointers by subtype to improve temporary locality on the vtable, trying to make data a particular subtype (Dog, e.g.) contiguously allocated with custom allocators to improve spatial locality on each subtype instance, etc.

    None of these burdens need be there if we model at a coarser level. You can have Dogs inheriting abstract Mammals. Now the cost of virtual dispatch is reduced to once per type of mammal, not once per mammal, and all mammals of a particular type can be represented efficiently and contiguously.

    You can still get all fancy and utilize OOP and polymorphism with a DOD mindset. The trick is to make sure you are designing things at a coarse enough level so that you aren't trying to fight against the type system and work around the data types to regain control over things like memory layouts. You won't have to bother with any of that if you design things at a coarse enough level.

    Interface Design

    There is still interface design involved with DOD at least as far as I see it, and you can have methods in your classes. It's still very important to design proper high-level interfaces, and you can still use virtual functions and templates and get very abstract. The practical difference I'd focus on is that you design aggregate interfaces, as in the case of the adjust_brightness method above, which leave you the breathing room to optimize without cascading design changes throughout your codebase. We design an interface to process multiple pixels of an entire image instead of one that processes a single pixel at a time.

    DOD interface designs are often designed to process in bulk, and typically in a way that has an optimal memory layout for the most performance-critical, linear complexity sequential loops that have to access everything.

    So if we take your example with Model, what's missing is the aggregate side of the interface.

    struct Models {
        // Methods to process models in bulk can go here.
    
        struct Model {
            // vertex buffers
            GLuint Positions, Normals, Texcoords, Elements;
            // textures
            GLuint Diffuse, Normal, Specular;
            // further material properties
            GLfloat Shininess;
        };
    
        std::vector<Model> models;
    };
    

    This doesn't strictly have to be represented using a class with methods. It could be a function which accepts an array of structs. These details don't really matter so much, what matters is that the interface is mostly designed to process sequentially in bulk, while the data representation is designed optimally for that case.

    Hot/Cold Splitting

    Looking at your Person class, you might still be thinking somewhat in a classical interface kind of way (even though the interface here is just data). Again, DOD would primarily use a struct for a whole thing only if that was the optimal memory configuration for the most performance-critical loops. It's not about logical organization for humans, it's about data organization for machines.

    struct Person {
        Person() : Walking(false), Jumping(false) {}
        float Height, Mass;
        bool Walking, Jumping;
    };
    

    First let's put this in context:

    struct People {
        struct Person {
            Person() : Walking(false), Jumping(false) {}
            float Height, Mass;
            bool Walking, Jumping;
         };
    };
    

    In this case, are all the fields often accessed together? Let's say, hypothetically, that the answer is no. These Walking and Jumping fields are accessed only sometimes (cold), while Height and Mass are accessed all the time repeatedly (hot). In this case, a potentially more optimal representation might be:

    struct People {
        vector<float> HeightMass;
        vector<bool> WalkingJumping;
    };
    

    Of course you can make two separate structs here, have one point to the other, etc. The key is that you design this ultimately from a memory layout/performance standpoint, and ideally with a good profiler in your hand and a solid understanding of the common user-end code paths.

    From an interface standpoint, you design the interface with a focus towards processing people, not a person.

    The Problem

    With that out of the way, on to your problem:

    I can only create models from this module, not from others. Should I move this to the type class Model while complexifying it?

    This is more of a subsystem design kind of concern. Since your Model rep is all about OpenGL data, it should probably belong in the module that can proper initialize/destroy/render it. It might even be a private/hidden implementation detail of this module, at which point you apply a DOD mindset within the implementation of the module.

    The interface available to the outside world to add models, destroy models, render them, etc. should ultimately be designed for bulk, however. Think of it as designing a high-level interface for a container where the methods you would be tempted to add for each element instead end up belonging to the container, as in our image example above with adjust_brightness.

    Complex initialization/destruction often needs a one-at-a-time design mentality, but the key is that you do this through an aggregate interface. Here you might still forego the standard constructor and destructor for a Model in favor of initializing on adding a GPU Model to render, cleaning up the GPU resources on removing it from the list. It's somewhat back to C-style coding for the individual type (person, e.g.), though you can still get very sophisticated with C++ goodies for the aggregate interface (people, e.g.).

    My question is, should I add methods to my type classes?

    Mainly design for bulk, and you should be on your way. In the examples you showed, typically no. It doesn't have to be the hardest rule but your types are modeling individual things, and to leave room for DOD often requires zooming out and designing interfaces which deal with many things.