How do I implement Failover within an Akka.NET cluster using the Akka.FSharp API?
I have the following cluster node that serves as a seed:
open Akka
open Akka.FSharp
open Akka.Cluster
open System
open System.Configuration
let systemName = "script-cluster"
let nodeName = sprintf "cluster-node-%s" Environment.MachineName
let akkaConfig = Configuration.parse("""akka {
actor {
provider = "Akka.Cluster.ClusterActorRefProvider, Akka.Cluster"
}
remote {
log-remote-lifecycle-events = off
helios.tcp {
hostname = "127.0.0.1"
port = 2551
}
}
cluster {
roles = ["seed"] # custom node roles
seed-nodes = ["akka.tcp://script-cluster@127.0.0.1:2551"]
# when node cannot be reached within 10 sec, mark is as down
auto-down-unreachable-after = 10s
}
}""")
let actorSystem = akkaConfig |> System.create systemName
let clusterHostActor =
spawn actorSystem nodeName (fun (inbox: Actor<ClusterEvent.IClusterDomainEvent>) ->
let cluster = Cluster.Get actorSystem
cluster.Subscribe(inbox.Self, [| typeof<ClusterEvent.IClusterDomainEvent> |])
inbox.Defer(fun () -> cluster.Unsubscribe(inbox.Self))
let rec messageLoop () =
actor {
let! message = inbox.Receive()
// TODO: Handle messages
match message with
| :? ClusterEvent.MemberJoined as event -> printfn "Member %s Joined the Cluster at %O" event.Member.Address.Host DateTime.Now
| :? ClusterEvent.MemberLeft as event -> printfn "Member %s Left the Cluster at %O" event.Member.Address.Host DateTime.Now
| other -> printfn "Cluster Received event %O at %O" other DateTime.Now
return! messageLoop()
}
messageLoop())
I then have an arbitrary node that could die:
open Akka
open Akka.FSharp
open Akka.Cluster
open System
open System.Configuration
let systemName = "script-cluster"
let nodeName = sprintf "cluster-node-%s" Environment.MachineName
let akkaConfig = Configuration.parse("""akka {
actor {
provider = "Akka.Cluster.ClusterActorRefProvider, Akka.Cluster"
}
remote {
log-remote-lifecycle-events = off
helios.tcp {
hostname = "127.0.0.1"
port = 0
}
}
cluster {
roles = ["role-a"] # custom node roles
seed-nodes = ["akka.tcp://script-cluster@127.0.0.1:2551"]
# when node cannot be reached within 10 sec, mark is as down
auto-down-unreachable-after = 10s
}
}""")
let actorSystem = akkaConfig |> System.create systemName
let listenerRef =
spawn actorSystem "temp2"
<| fun mailbox ->
let cluster = Cluster.Get (mailbox.Context.System)
cluster.Subscribe (mailbox.Self, [| typeof<ClusterEvent.IMemberEvent>|])
mailbox.Defer <| fun () -> cluster.Unsubscribe (mailbox.Self)
printfn "Created an actor on node [%A] with roles [%s]" cluster.SelfAddress (String.Join(",", cluster.SelfRoles))
let rec seed () =
actor {
let! (msg: obj) = mailbox.Receive ()
match msg with
| :? ClusterEvent.MemberRemoved as actor -> printfn "Actor removed %A" msg
| :? ClusterEvent.IMemberEvent -> printfn "Cluster event %A" msg
| _ -> printfn "Received: %A" msg
return! seed () }
seed ()
What is the recommended practice for implementing failover within a cluster?
Specifically, is there a code example of how a cluster should behave when one of its nodes is no longer available?
First of all it's a better idea to rely on MemberUp and MemberRemoved events (both implementing ClusterEvent.IMemberEvent interface, so subscribe for it), as they mark phases, when node joining/leaving procedure has been completed. Joined and left events doesn't necessarily ensure that node is fully operable at signaled point in time.
Regarding failover scenario:
Terminated
message to all of its watchers. This way you may implement your own logic i.e. recreating actor on another node. This is actually the most generic way, as it doesn't use any extra plugins or configuration, but the behavior needs to be described by yourself.