Search code examples
c#.netazure-table-storageactororleans

Orleans slow with minimalistic use case


I'm evaluating Orleans for a new project we are starting soon.

Eventually we want to run a bunch of persistent actors, but I'm currently struggling to just get base line in memory version of orleans to be performant.

Given the following grain

using Common.UserWallet;
using Common.UserWallet.Messages;
using Microsoft.Extensions.Logging;

namespace Grains;

public class UserWalletGrain : Orleans.Grain, IUserWalletGrain
{
    private readonly ILogger _logger;

    public UserWalletGrain(ILogger<UserWalletGrain> logger)
    {
        _logger = logger;
    }

    public async Task<CreateOrderResponse> CreateOrder(CreateOrderCommand command)
    {


        return new CreateOrderResponse(Guid.NewGuid());
    }

    public Task Ping()
    {
        return Task.CompletedTask;
    }
}

The following silo config:

static async Task<IHost> StartSiloAsync()
{
    ServicePointManager.UseNagleAlgorithm = false;

    var builder = new HostBuilder()
        .UseOrleans(c =>
        {
            c.UseLocalhostClustering()
            .Configure<ClusterOptions>(options =>
            {
                options.ClusterId = "dev";
                options.ServiceId = "OrleansBasics";
            })
            .ConfigureApplicationParts(
                parts => parts.AddApplicationPart(typeof(HelloGrain).Assembly).WithReferences())

            .AddMemoryGrainStorage("OrleansMemoryProvider");
        });

    var host = builder.Build();
    await host.StartAsync();

    return host;
}

And the following client code:

static async Task<IClusterClient> ConnectClientAsync()
{
    var client = new ClientBuilder()
        .UseLocalhostClustering()
        .Configure<ClusterOptions>(options =>
        {
            options.ClusterId = "dev";
            options.ServiceId = "OrleansBasics";
        })
        //.ConfigureLogging(logging => logging.AddConsole())
        .Build();

    await client.Connect();
    Console.WriteLine("Client successfully connected to silo host \n");

    return client;
}

static async Task DoClientWorkAsync(IClusterClient client)
{
    List<IUserWalletGrain> grains = new List<IUserWalletGrain>();

    foreach (var _ in Enumerable.Range(1, 100))
    {
        var walletGrain = client.GetGrain<IUserWalletGrain>(Guid.NewGuid());
        await walletGrain.Ping(); //make sure grain is loaded
        grains.Add(walletGrain);
    }

    var sw = Stopwatch.StartNew();
    await Parallel.ForEachAsync(Enumerable.Range(1, 100000), async (o, token) =>
    {
        var command = new Common.UserWallet.Messages.CreateOrderCommand(Guid.NewGuid(), 4, 5, new List<Guid> { Guid.NewGuid(), Guid.NewGuid() });

        var response = await grains[o % 100].CreateOrder(command);

        Console.WriteLine($"{o%10}:{o}");
    });

    Console.WriteLine($"\nElapsed:{sw.ElapsedMilliseconds}\n\n");
}

I'm able to send 100,000 msg in 30 seconds. Which amount to about 3333 msgs per second. This is way less than I would expect when looking at (https://github.com/yevhen/Orleans.PingPong)

It also does not seem to matter if I start of with 10 grains, 100 grains, or 1000 grains.

When I then add persistence with table storage configured

.AddAzureTableGrainStorage(
        name: "OrleansMemoryProvider",
        configureOptions: options =>
        {
            options.UseJson = true;
            options.ConfigureTableServiceClient(
                "secret);
        })

And a single

await WriteStateAsync(); in CreateOrder things get drastically worse at about 280 msgs / s

When I go a bit further and implement some basic domain logic. Calling other actors etc we essentially grind to a snails pace at 1.2 msgs / s

What gives?

EDIT:

  • My cpu is at about 50%.

Solution

  • Building high performance applications can be tricky and nuanced. The general solution in Orleans is that you have many grains and many callers, so you can achieve a high degree of concurrency and thus throughput. In your case, you have many grains (100), but you have few callers (I believe it's one per core by default with Parallel.ForEachAsync), and each caller is writing to the console after every call, which will slow things down substantially.

    If I remove the Console.WriteLine and run your code on my machine using Orleans 7.0-rc2, the 100K calls to 100 grains finish in about 850ms. If I change the CreateOrderRequest & CreateOrderResponse types from classes to structs, the duration decreases to 750ms.

    If I run a more optimized ping test (the one from the Orleans repository), I see approximately 550K requests per second on my machine with one client and one silo process sharing the same CPU. The numbers are approximately half this for Orleans 3.x. If I co-host the client within the silo process (i.e, pull IClusterClient from the silo's IServiceProvider) then I see over 5M requests per second.

    Once you start doing non-trivial amounts of work in each of your grains, you're going to start running up against other limits. I tested calling a single grain from within the same process recently and found that one grain can handle 500K RPS if it is doing trivial work (ping-pong). If the grain has to write to storage on every request and each storage write takes 1ms then it will not be able to handle more than 1000 RPS, since each call waits for the previous call to finish by default. If you want to opt out of that behavior, you can do so by enabling reentrancy on your grain as described in the documentation here: https://learn.microsoft.com/en-us/dotnet/orleans/grains/reentrancy. The Chirper example has more details on how to implement reentrancy with storage updates: https://github.com/dotnet/orleans/tree/main/samples/Chirper.

    When grain methods become more complex and grains need to perform significant amounts of I/O to serve each request (for example, storage updates and subsequent grain calls), the throughput of each individual grain will decrease since each request involves more work. Hopefully, the above numbers give you an approximate guide.