Search code examples
c#cluster-computingdistributed-computingorleans

Work distribution in Orleans


In Microsoft Orleans I am trying to implement something like a list of available work using the code below:

        public Task<WorkResponse> PerformWork(WorkRequest request)
    {
        Console.WriteLine("Performing work for id: {0}", request.Param1);
        Thread.Sleep(TimeSpan.FromSeconds(10));                       

        var result = Task.FromResult(new WorkResponse(request.Param1, request.Param2, request.Param3));
        Console.WriteLine("Completed work for id: {0}", request.Param1);

        return result;
    }

This works however if I start a number of tasks using code like this things don't behave properly.

                _work
                .ToList()
                .AsParallel()                    
                .ForAll(x =>
                {        
                    Console.WriteLine("Requesting work for id: {0}", x.Key);
                    var worker = GrainFactory.GetGrain<IWork>(x.Key);
                    var response = worker.PerformWork(x.Value);

                    Console.WriteLine("Response for work id: {0}", x.Key);
                });

This works however if another node joins the cluster that work seems to never move to the new node. Only newly scheduled work is ever processed on that new node.

It also seems that if there is a bunch of this extra work in the Orleans Queue then new nodes get stuck joining the cluster.


Solution

  • Orleans uses a fixed number of worker threads to minimize the overhead of context switching and threads. Calling Thread.Sleep() is going to cause trouble, since the workers will be too busy sleeping to pull new work from the queue.

    What happens when you avoid Thread.Sleep(...) and use await Task.Delay(...) instead.

    The membership algorithm which Orleans uses requires that silos be responsive: slow silos are indistinguishable from dead silos.