Faster nested loops over tens of thousands of particles

I'm doing some visual arts research inside the Unity environement. I'm trying to achieve something very similar to differential line growth as explained here but my main worry is that somewhere in the algorithm, every node should check every other node to see how close it is and construct an array of repulsion forces from all these closeby particles.

Here's a snippet of my code :

   public void Differentiate()
    {
        int c = nodes.Count;                                 
        Vector3[] repulsionForces = new Vector3[c];

        for (int i = 0; i < c ; i++)
        {

            // Construct nearbies
            List<DifferentialNode> nearby = new List<DifferentialNode>();
            foreach(DifferentialNode n in nodes)
            {
                float d = Vector3.Distance(n.position, nodes[i].position);
                if (d < 5)
                {
                    nearby.Add(n);
                }
            }
            // Get Forces
            Vector3 repulsionForce = nodes[i].RepulsionForce(nearby);

            // Limit Forces
            repulsionForce = Vector3.ClampMagnitude(repulsionForce, maxForce);

            // Apply Multipliers
            repulsionForce *= Repulsion;

            // Put Forces into Array
            repulsionForces[i] = repulsionForce;
        }

        for (int i = 0; i < c; i++)
        {
            nodes[i].applyForce(repulsionForces[i]);
            nodes[i].update();
            nodes[i].velocity = new Vector3(0, 0, 0);
        }

And here is my RepulsionForce() function in the DifferentialLineNode class

public Vector3 RepulsionForce(List<DifferentialNode> nearby)
{
    Vector3 repulsionForce = new Vector3();

    foreach (DifferentialNode n in nearby)
    {
        // calculate distance between both
        float d = Vector3.Distance(n.position, this.position);
        // calculate difference and divide by exp(d) to get less influence when far
        Vector3 diff = ( this.position - n.position ) / (Mathf.Exp(d)); 
        repulsionForce += diff;
    }
    repulsionForce /= (float)nearby.Count;
    repulsionForce.Normalize();

    return repulsionForce;
}

As soon as I start the game everything goes down to under 1fps and I gather that the nested loop is where it's coming from because of the n^n complexity. I've been looking into Octree / KdTree implementations but can't find any explained code. Are there other routes ? More than one ? Can any be combined ? Thanks a lot

Solution

Computing points within distance for each point is O(n^2), so it is not so strange that performance drops if the number of particles are large. But this can be fairly easily improved. There are several options for search structures that may be used:

3D grid. This should be simple to implement, simply create a 3D array of lists. Pick a suitable bin size so you have a fixed number of bins to iterate over. The major downside with this is memory usage, this is will be more significant if there are large voids with no particles in your simulation.
Sparse octree. The major advantage over a grid would be better memory usage if points are unevenly spaced. It would also scale better if you need to search an arbitrary distance. The downside would be more complex implementation.
KdTree. This is a quite nice data structure since it uses fairly little memory, can be implemented with contiguous memory, and scales well. One disadvantage is that it is difficult to handle changes in positions. The implementation is also more complex than a grid.

For a grid or Octree it would be possible to move points between bins. For a kd tree it would probably be better to rebuild the tree. Any of the options should reduce search times from O(N^2) to O(n log n).

Given the context of your question I would recommend starting with a simple 3D grid. If this is not sufficient I would consider using a kd tree. I would avoid combining datastructures unless there is some specific reason to do this. Octrees and kdtrees should scale well enough.

I would recommend trying some profiling tools to check what actually takes time. I would also recommend setting up some controlled environment to run the algorithm on a known dataset and measure the time. Benchmark.Net is the gold standard, but a simple stopwatch should be fairly good when measuring large changes.

There should also be some opportunity for micro-optimizations. Avoid creating lists in a tight loop, if needed, create a list that is reused. Avoid repeated operations. Avoid expensive operations. Prefer arrays and lists over other collections since the runtime has special optimizations for these. Prefer for instead of foreach since the former may avoid the creation of an iterator object.