Search code examples
c#.nettask-parallel-library.net-4.8

Writing repeating parallel loops for animation, avoiding System.Threading.Tasks.Parallel.For


I am writing a plugin to Rhino which animates some ray-traced data. Rhino handles all the drawing; I just have to provide it with something to draw.

The following Animate function is called from the UI thread using Task.Run(() => Animate(doc, cts.Token, Embree);. Within the function is a while loop which repeats until the CancellationToken is thrown or lengthtraveled > maxTimeLength (both set by the UI thread). The while loop computes each frame of the animation; i.e. the step of the individual points along their path. When a boundary is reached, Embree computes the intersection of that approaching ray with the boundary and determines the direction and length of the reflected ray.

I would like to take advantage of the multi-core processor sitting in front of me and farm out the main for (int i = 0; i < sphereNum; i++) loop (marked below) into multiple concurrent threads.

Currently I am achieving about 45Hz refresh rate on the single thread, but it would be nice if it was more performant for less capable (but still multi-cored) machines.

My impression of Parallel.For(int fromInclusive, int toExclusive, Action<int> body) is that it introduces a fair amount of overhead and would be spinning up a new set of threads each time its called - which seems wasteful since I know that the same procedure will simply be repeated ad nauseam. So why not load several threads which each compute a portion of the points (25,000+) and then waits for the data to be drawn, following which command will be given to compute the next animation step.

Here's the Animate function:

private void Animate(RhinoDoc doc, CancellationToken cts, EMBContainer Embree)
{
    double refreshInterval = 1.0 / 45.0;
    int refreshMS = (int)(refreshInterval * 1000.0);
    double slowdownFactor = 1.0 / 20.0;
    double Speed = 10.0;
    double spherestep = refreshInterval * slowdownFactor * Speed;
    double lengthtraveled = 0.0;

    int sphereNum = 25000;
    spheres = new RaySphereData[sphereNum];
    points = new Rhino.Geometry.PointCloud(Enumerable.Repeat(StartPoint, sphereNum));
    int refreshcounter = 1, resettozero = 0;
    long waited = 0;

    System.Numerics.Vector3 start, direction;

    var vectors = StartingVectors();
    start = StartPoint.;
    for (int i = 0; i < sphereNum; i++)
    {
        direction  = vectors[i];
        Embree.RTCRayHit hits = Embree.Intersect(start, direction);

        points[i].Location = StartPoint;
        spheres[i].step = direction; 
        spheres[i].step *= spherestep;
        spheres[i].pathlength = hits.Ray.tfar;
        spheres[i].done = false;

        hits.Hit.Normalize();
        spheres[i].nextdirection = Vector3.Reflect(direction, hits.Hit.normal);
        spheres[i].nextstart = start + direction * hits.Ray.tfar;
    }

    Stopwatch stopwatch2 = new Stopwatch();
    Stopwatch stopwatch3 = new Stopwatch();
    double interimlength;

    Stopwatch stopwatch = Stopwatch.StartNew();
    Stopwatch stopwatch1 = Stopwatch.StartNew();

    while (true)
    {
        stopwatch2.Start();
        if (cts.IsCancellationRequested) return;

        var delay = Task.Delay(refreshMS); 

        doc.Views.Redraw();
        lengthtraveled += spherestep;
        if (lengthtraveled > maxTimeLength)
        {
            return;
        }
        else
        {
// start multi-threading here; i.e. send Monitor, Barrier or Semaphore signal to the animation frame compute function
            for (int i = 0; i < sphereNum; i++)
            {
                if (lengthtraveled > spheres[i].pathlength)
                {
                    do
                    {
                        interimlength = lengthtraveled - spheres[i].pathlength;
                        direction = spheres[i].nextdirection;
                        start = spheres[i].nextstart;
                        var hits = Embree.Intersect(start, direction);
                        spheres[i].pathlength += hits.Ray.tfar;
                        spheres[i].step = direction.ToVector3d();
                        spheres[i].step.Unitize();
                        points[i].Location = spheres[i].nextstart.ToPoint3d() + (spheres[i].step * interimlength);
                        spheres[i].nextstart = start + direction * hits.Ray.tfar;
                        hits.Hit.Normalize();
                        spheres[i].nextdirection = Vector3.Reflect(direction, hits.Hit.normal);
                    }
                    while (lengthtraveled > spheres[i].pathlength);

                    spheres[i].step *= spherestep;
                }
                else
                {
                    points[i].Location += spheres[i].step;
                }
            }
// end multi-threading
        }
        stopwatch3.Start();
        await delay;
        stopwatch3.Stop();
        stopwatch2.Stop();
        waited += stopwatch3.ElapsedMilliseconds;
        stopwatch3.Reset();
        stopwatch2.Reset();
        if(refreshcounter % 100 == 0)
        {
            stopwatch1.Stop();
            refreshInterval = (stopwatch1.ElapsedMilliseconds - waited) / (1000.0 * refreshcounter);
            refreshMS = (int)(refreshInterval * 1000.0);
            spherestep = refreshInterval * slowdownFactor * Speed;
            for (int i = 0; i < sphereNum; i++)
            {
                spheres[i].step.Unitize();
                spheres[i].step *= spherestep;
            }
            stopwatch1.Start();
        }
        refreshcounter++;
    }
}

private struct RaySphereData
{
    public double pathlength;
    public Rhino.Geometry.Vector3d step;
    public System.Numerics.Vector3 nextstart;
    public System.Numerics.Vector3 nextdirection;
}

I have looked into Barrier, Monitor, and SemaphoreSlim, but either through my lack of understanding or the thinness of the documentation of those .NET features I haven't been able to make it work. Are there any suggestions? (Also if there's any better suggestions for the adaptation of the refresh rate, I'm all for it.)


Solution

  • My impression of Parallel.For is that it introduces a fair amount of overhead and would be spinning up a new set of threads each time its called.

    Your impression is wrong. By default the Parallel.For method uses the current thread, plus threads from the ThreadPool. You can change the default by providing a custom TaskScheduler, but there is no reason to do that. What I would advise you to do is to specify the MaxDegreeOfParallelism, so that the parallel loop doesn't saturate the ThreadPool:

    ParallelOptions parallelOptions = new()
    {
        MaxDegreeOfParallelism = Environment.ProcessorCount
    };
    
    Parallel.For(0, 25000, parallelOptions, i =>
    {
        // ...
    });
    

    In case your application makes heavy use of the ThreadPool, you could consider increasing the threshold for immediate thread creation by using the ThreadPool.SetMinThreads API.

    So why not load several threads which each compute a portion of the points [...]?

    Because it's unlikely that your custom approach will be better than the tools provided by the .NET standard libraries. Most likely it will be worse, regarding either performance, behavior or correctness, or all of the above.