Akka Stateful Actor Replication Across Nodes

I have a Akka application that I run as a Web application using Play Framework. When I start this application, I create a set of Actors ad these actors have state. These actors respond to external messages and mutate the state (I mutate the state using the context.become(...) mechanism).

I now want to run multiple instances of this Web application so that I can have resiliency. But the problem with this is that the web application exposes a WebSocket endpoint to which a Front end application connects. I then stream the state of the Actor instance every 4 seconds via the WebSocket endpoint to the Front end application.

I have few questions here:

How can I run a couple more instances of my Play Framework web application, keeping in mind that I have Stateful actors! I want to make sure that if one of the instance goes down, the Stateful actor in that instance is resurrected with the state as it was just before it went down. Would Akka Cluster Sharding be the way to go?
Say I have 10000 of these Stateful actors in my application, how can I leverage Akka Cluster Sharding such that not all of these 10000 actors are running in the same node? I mean how can I make the first 5000 actors run on node1 and the next 5000 run on node2? Right now what I do with the single instance is that when I start my application, I read the database and I use the data to start one actor instance per database row.
How does ask pattern works with cluster sharding? Any messages that I send to the Shard Region will be routed to the corresponding Actor instance, but what about asking for messages from a specific actor? Will it also work the same way? I ask for a message from an Actor, I send my ask request to the Shard Region and this shard region forwards this message to the corresponding Actor? Is this correct?

Any suggestions?

Solution

There are two different ways to setup an akka cluster to do that:

Have all your Play Framework instances form a cluster as you suggested
Setup a separate akka cluster (call it backend if you want), and use the cluster client from the Play Framework instances to connect and interact with your cluster

In both cases above, I would use cluster sharding module for what you want to achieve.

You don't make it clear how your actor's state is currently persisted. Are all the state changes somehow written back to the database? Akka persistence might help you here if you have nothing.

To your questions:

Cluster sharding will take care of splitting your actors into multiple running instances, migrate them when an instances goes down or up. Cluster sharding will not deal with your actor's state at all. This is a completely separate concern and this is why I mentioned akka persistence above.
The key for a good distribution of your actors when using cluster sharding is to provide a good extractShardId and enough shards. You will most likely not have exactly 5000 actors per node as per your example, but it can be close enough. I would recommend starting with a high number of shards, much mugger than the number of cluster instances you are planning to use, 100 is ok to start with. This plus a good extractShardId will approximately evenly split your load among cluster nodes.
Ask works just fine, what you say is correct. The sharding system uses a combination of extractShardId and extractEntityId to route your message to the correct actor instance. That actor can then just reply to sender().

Notes:

all the above also works fine through a cluster client.
depending on what you need, it might not be needed to start all the actors right from the start. When you send a message to a sharded actor which doesn't exist yet, that actor will be automatically and transparently started by the cluster sharding system. At that time you can restore its state. This plus automatic passivation can help you save a lot of resources!