c++arrays simulation compiler-optimization particles

One element double array vs. double variable (C++)

I am working on a C++ code to simulate N interacting particles (as in physics particles). The particles move in d dimensions (d=1,2, or 3). In each iteration of the algorithm, forces are computed based on all distances between particle pairs (the most expensive part in terms of runtime). Efficiency is key in these simulations. In my current setup, I have three different code versions, depending on the spatial dimension d of the system. The particle position, velocities and forces are stored in a struct like

struct coordinate{
double x, double y, double z   // 3 coordinates for the 3-dimensional case.
}

I now wanted to fuse these three code versions to be able to enter the dimension at the start of the simulation via argv. Obviously, the dimension and number of double values in the struct are thus not known at compile time. The question is how to write the new code version without losing efficiency.

First, I naively removed the whole struct and replaced it with std::vector<double> whose size would be given by the dimension d. However, when benchmarking the code, accessing the vector elements in the tight loops of my simulation and defining new vectors was much more expensive than simply working with the structs, which surprised me. Apparently, this is because std::vector allocates the memory dynamically unlike the structs. Furthermore, if I have a 2D vector std:: vector <std:: vector<double>> where the number of rows corresponds to the number of particles and the columns to the (x,y,z) coordinates (in 3D), it seems like the inner vectors are not stored next to each other in memory (as opposed to the struct version std:: vector<coordinate> with the struct above, where all the (x,y,z) of all particles are stored contiguously).

The next idea was to work with std::array instead, where I would always define an array of some maximum admissible dimension MAXDIM (known at compile time) and then only work on the array elements given by the dimension d as it is entered during runtime. For this, I created a new version of the coordinate struct:


constexpr size_t MAXDIM = 3; // Maximum allowed dimension.

class coordinate {

    public:
        
        // Default constructor.
        coordinate() 
            : dimension {0} {
                coords.fill(0.0); // Initialize all elements to 0.0.
        }

        coordinate(size_t dim)
        {
                set_dimension(dim);
                coords.fill(0.0);
        }

        // Method to set the runtime dimension.
        void set_dimension(size_t dim) {
            if (dim > MAXDIM) {
                throw std::out_of_range("Dimension exceeds MAXDIM.");
            }
            dimension = dim;
        }

        // Access operator for reading/writing within the runtime dimension.
        double& operator[](size_t index) {
            return coords[index];
        }

        const double& operator[](size_t index) const {
            return coords[index];
        }

    private:
        std::array<double, MAXDIM> coords; // Fixed-size array.
        size_t dimension;                  // Actual dimension.

};

If I use this code version and activate all compiler optimizations (for g++, this is the -O3 flag), the resulting code version is exactly as fast as the original code when using 2 dimensions, which is great. However, what surprised me was that when using only 1 dimension, the code is still 20 to 30% slower than the original version.

ChatGPT suggested that this is because in the original version where the coordinates were stored as plain double variables, in the 1D-case the compiler would understand that the structure only holds a single scalar object for which it can run certain optimizations which it can not run if that scalar is stored in an array. In the 2D case, however, the workload for an array that stores two numbers or storing the two numbers as separate double values is identical, because these scalar optimizations don't play a role there.

If one only worked with the 1D-case, one would obviously not store the single coordinate in an array, but I want my code to work in all dimensions without recompiling (since I want to turn it into an app).

Without seeing more of the codebase, does anyone have an idea whether ChatGPT is right and how to solve this problem? I already profiled both code versions with perf but could not find noteworthy differences (I can post perf results if necessary, though).

Solution

This is what templates are for: write the entire simulation in terms of

template<unsigned short D>

(very much without storing the dimension in each particle!) and then have main branch between choices using very simple if tedious code like

switch(runtime_dimension) {
case 1: return run<1>(initial_state);
case 2: return run<2>(initial_state);
case 3: return run<3>(initial_state);
default: // die horribly
}

It’s possible to avoid the switch with an array of function pointers and std::make_integer_sequence, but it’s not worth it for 3 choices.