Azure Service Fabric - Distributed computation code sample Monte Carlo simulation - performance issues

Having listened to recent azure podcasts (particularly the one on building low latency financial systems on azure) and reading all the hype about Service Fabric I decided to try to alter the 'Distributed computation code sample Monte Carlo simulation' pattern for my needs.

My scenario is: One request with a given starting state to run 10k full sports match simulations using a simplistic (computationally-wise) monte-carlo based model.

My first attempt was:

1 * Stateful 'Processor' Actor that receives the start state of the match and forwards it to 10k + Task Actors, along with relevant Aggregator ActorId
- 10K+ * StateLess 'Task' Actors that ran 1 simulation and passed the Result to their Aggregator Actor. Simulation time was small (~2ms)
- 100 * Stateful 'Aggregator' Actors that aggregated received simulations and passed to a finaliser Actor
- 1 * 'Finaliser' Actor that calculated the final result

Running the above on my dev box simply using Tasks takes < 100ms, but the above setup (running on the dev machine as a local cluster) took 50secs and more!

After debugging through one potential cause that i found was the amount of time it takes for the Processor Actor to send the initial tasks so i was wondering what sort of overhead there is in calling Service Fabric (I guess all sorts of Naming service calls are happening when i call an actor's methods) and whether the slowness was likely to be due to this and my number of tasks?

To eliminate other possibilities i did the following and noticed only very small differences in total time:

Made all actors stateless to ensure that state management wasn't adding overheads.
Created all ActorProxies in the Processor and stored their references for future calls to ensure Actor Activations weren't causing issues.

Does anybody have any suggestions about where to go from here, or has anybody tried to implement something similar?

Thanks, Alex

Solution

I would have posted this as a comment, but I do not yet have enough reputation for that! If you reference this page in Service Fabric's documentation, take a look at the comments below the article, particularly the comment trail started by "tom" sometime around June, 2015. He was experiencing poor performance (~20 operations per second) with stateful actors, which seemed to be acknowledged as an area of future improvement. They stressed the use of readonly attributes on non-mutating methods to significantly improve performance. Abhishek Ram also included some notes and a link to information on relevant performance counters that may help with troubleshooting.

You noted that you tried using stateless actors with little impact on performance. I would point further down the comment trail where another user reports achieving 2k+ operations per second on a single actor using readonly methods, which I would expect to perform similarly to stateless actor methods. Perhaps the information from the performance counters can be compared with this to see how closely your performance is matching their somewhat trivial example in the comments.