Search code examples
c++performancesimulationquadtree

How do I make this recursive function faster? (Quadtree)


I'm learning C++ and am doing something I'm comfortable with in java to start out. Particle simulation and flocking using a quadtree to cheaply find particles in a region. Everything is working but when I use the quadtree to get the particles from a region it's really slow (about 1s for 5000 calls).

I tried replacing the vector with an array and measuring the execution time of various parts of the function. Am I making any rooky mistakes like unnecessarily copying objects etc.? I'm using 5000 particles, I can't imagine 1fps is the fastest it can go.

Full code for minimal reproducable example as per request:

main.cpp

#include <string>
#include <iostream>
#include <random>
#include <chrono>
#include <thread>
#include <cmath>

#include "Particle.h"
#include "Quadtree.h"

// Clock
using namespace std::chrono;
using namespace std::this_thread;

// Global constants
const int SCREEN_WIDTH = 640;
const int SCREEN_HEIGHT = 480;
const int desiredFPS = 30;
const int frameTimeMS = int(1000 / (double)desiredFPS);
const int numberOfParticles = 5000;

// Random number generation
std::random_device dev;
std::mt19937 rng(dev());
std::uniform_real_distribution<> dist(0, 1);

Particle particles[numberOfParticles];
Quadtree quadtree = Quadtree(0, 0, SCREEN_WIDTH, SCREEN_HEIGHT);

int main(int argc, char* args[])
{
    for (int i = 0; i < numberOfParticles; i++)
    {
        particles[i] = Particle(dist(rng) * SCREEN_WIDTH, dist(rng) * SCREEN_HEIGHT);
    }

    // Clock for making all frames equally long and achieving the desired framerate when possible
    auto lapStartTime = system_clock::now();

    // Main loop
    for (int i = 0; i < 1; i++)
    {
        // Insert the particles into the quadtree
        quadtree = Quadtree(0, 0, SCREEN_WIDTH, SCREEN_HEIGHT);
        for (int i = 0; i < numberOfParticles; i++)
        {
            quadtree.insert(&particles[i]);
        }

        double neighbourhoodRadius = 40;
        for (int i = 0; i < numberOfParticles; i++)
        {
            // THIS IS THE PART THAT IS SLOW
            std::vector<Particle*> neighbours = quadtree.getCircle(
                particles[i].x,
                particles[i].y,
                neighbourhoodRadius
            );
        }

        // Update clocks
        auto nextFrameTime = lapStartTime + milliseconds(frameTimeMS);
        sleep_until(nextFrameTime);
        lapStartTime = nextFrameTime;
    }
    return 0;
}

Quadtree.h

#pragma once

#include <vector>
#include "Particle.h"
#include "Rect.h"

class Quadtree
{
public:
    const static int capacity = 10; // Capacity of any section
    Quadtree(double px, double py, double width, double height);
    Quadtree(Rect r);
    bool insert(Particle* p); // Add a particle to the tree
    std::vector<Particle*> getCircle(double px, double py, double r);
    int numberOfItems(); // Total amount in the quadtree
private:
    std::vector<Particle*> particles; // Particles stored by this section
    std::vector<Quadtree> sections; // Branches (only if split)
    Rect area; // Region occupied by the quadtree
    bool isSplit() { return sections.size() > 0; }
    void split(); // Split the quadtree into 4 branches
};

Quadtree.cpp

#include <iostream>
#include "Quadtree.h"

Quadtree::Quadtree(double px, double py, double width, double height)
{
    area = Rect(px, py, width, height);
    sections = {};
    particles = {};
}

Quadtree::Quadtree(Rect r)
{
    area = r;
    sections = {};
    particles = {};
}

bool Quadtree::insert(Particle* p)
{
    if (area.intersectPoint(p->x, p->y))
    {
        if (!isSplit() && particles.size() < capacity)
        {
            particles.push_back(p);
        }
        else
        {
            if (!isSplit()) // Capacity is reached and tree is not split yet
            {
                split();
            }

            // That this is a reference is very important!
            // Otherwise a copy of the tree will be modified
            for (Quadtree& s : sections)
            {
                if (s.insert(p))
                {
                    return true;
                }
            }
        }

        return true;
    }
    else
    {
        return false;
    }
}

std::vector<Particle*> Quadtree::getCircle(double px, double py, double r)
{
    std::vector<Particle*> selection = {};
    if (!isSplit())
    {
        // Add all particles from this section that lie within the circle
        for (Particle* p : particles)
        {
            double a = px - p->x;
            double b = py - p->y;
            if (a * a + b * b <= r * r)
            {
                selection.push_back(p);
            }
        }
    }
    else
    {
        // The section is split so add all the particles from the
        // branches together
        for (Quadtree& s : sections)
        {
            // Check if the branch and the circle even have any intersection
            if (s.area.intersectRect(Rect(px - r, py - r, 2 * r, 2 * r)))
            {
                // Get the particles from the branch and add them to selection
                std::vector<Particle*> branchSelection = s.getCircle(px, py, r);
                selection.insert(selection.end(), branchSelection.begin(), branchSelection.end());
            }
        }
    }
    return selection;
}

void Quadtree::split()
{
    sections.push_back(Quadtree(area.getSection(2, 2, 0, 0)));
    sections.push_back(Quadtree(area.getSection(2, 2, 0, 1)));
    sections.push_back(Quadtree(area.getSection(2, 2, 1, 0)));
    sections.push_back(Quadtree(area.getSection(2, 2, 1, 1)));

    std::vector<Particle*> oldParticles{ particles };
    particles.clear();

    for (Particle* p : oldParticles)
    {
        bool success = insert(p);
    }
}

int Quadtree::numberOfItems()
{
    if (!isSplit())
    {
        return particles.size();
    }
    else
    {
        int result = 0;
        for (Quadtree& q : sections)
        {
            result += q.numberOfItems();
        }
        return result;
    }
}

Particle.h

#pragma once

class Particle {
public:
    double x;
    double y;
    Particle(double px, double py) : x(px), y(py) {}
    Particle() = default;
};

Rect.h

#pragma once

class Rect
{
public:
    double x;
    double y;
    double w;
    double h;
    Rect(double px, double py, double width, double height);
    Rect() : x(0), y(0), w(0), h(0) {}
    bool intersectPoint(double px, double py);
    bool intersectRect(Rect r);
    Rect getSection(int rows, int cols, int ix, int iy);
};

Rect.cpp

#include "Rect.h"

Rect::Rect(double px, double py, double width, double height)
{
    x = px;
    y = py;
    w = width;
    h = height;
}

bool Rect::intersectPoint(double px, double py)
{
    return px >= x && px < x + w && py >= y && py < y + h;
}

bool Rect::intersectRect(Rect r)
{
    return x + w >= r.x && y + h >= r.y && x <= r.x + r.w && y <= r.y + r.w;
}

Rect Rect::getSection(int cols, int rows, int ix, int iy)
{
    return Rect(x + ix * w / cols, y + iy * h / rows, w / cols, h / rows);
}

Solution

  • So... In the original code creating the quadtree takes about 0.001s (relatively insignificant), and the neighbor search takes about 0.06s - here is our culprit (as mentioned by the OP).

    Passing the std::vector<Particle*> neighbours as a reference to the getCircle function, gets rid of the insert call at the end of the function as well as new vector allocations (hi to everyone saying "oh, it will be optimized away automatically"). The time is reduced to 0.011s.

    The nieghbours vector can be taken out of the main loop, and cleared after use, so that it only resizes on the first frame.

    I do not see any more immediately obvious targets (without doing a complete rewrite). Maybe I will add something later.


    I decided to approach this more systematically: I added an #if switch for every change I made and actually recorded some statistics, instead of eyeballing it. (Evey change is added incrementally, times include tree construction).

    original by reference out of loop
    min time: 0.0638s 0.0127s 0.0094s
    avg time: 0.0664s 0.0136s 0.0104s
    max time: 0.0713s 0.0157s 0.0137s

    All measurements were done on my machine, with optimized build, using QueryPerfoemanceCounter.


    I did end up rewriting the whole thing...

    Got rid of vectors.

    • The Quadtree::particles is now Particle* particles[capacity] with a count.
    • sections is a pointer; isSplit just checks if sections is 0.
    • Since the total (or maximum) number of particles is known, the number of particles that can be returned by getCircle can't be more than that. So I allocate that much outside of the main loop to store neighbours. Adding another result involves just bumping a pointer (without even a check in release). And resetting it after use is done by setting the count to 0 (see arena or bump allocator).
    • The maximum number of quadtree nodes can be inferred from the number of particles. So, similarly, splitting just bumps the pointer by 4.

    Trying to precompute the Rect in getCircle, or put px, py, r (and/or that rect as well) in a struct (passed as value or reference) does not yield any improvement (or is detremental). (was suggested by Goswin von Brederlow).

    Then I flipped the recursion (was suggested by Ted Lyngmo). The temporary stack is, again, preallocated. And then I did the same thing for insert.

    rewrite non-recursive insert as well
    min_time: 0.0077 0.0069 0.0068
    avg_time: 0.0089 0.0073 0.0070
    max_time: 0.0084 0.0078 0.0074

    So in the end the most impactful thing was the very first - not inserting and not creating unnecessary vectors every call, but instead passing the same one by reference.

    One last thing - might want to store the quadtree particles separately, since most of the time getCircle is traversing nodes, where particles are not stored.

    Otherwise, I do not see how to improve this any more. At this point it would require someone actually smart or crazy...