Search code examples
scalaakkafault-tolerance

Akka + WithinTimeRange


I've testing the fault tolerant system of akka and so far it's been good when talking about retrying to send a msg according the maxNrOfRetries specified.

However, it does not restart the actor within the given time range, it restarts all at once, ignoring the within time range.

I tried with AllForOneStrategy and OneForOneStrategy but does not change anything.

Trying to follow this blog post: http://letitcrash.com/post/23532935686/watch-the-routees, this is the code I've been working.

class Supervisor extends Actor with ActorLogging {

  var replyTo: ActorRef = _

  val child = context.actorOf(
    Props(new Child)
      .withRouter(
        RoundRobinPool(
          nrOfInstances = 5,
          supervisorStrategy =
            AllForOneStrategy(maxNrOfRetries = 3, withinTimeRange = 10.second) {
              case _: NullPointerException     => Restart
              case _: Exception                => Escalate
            })), name = "child-router")

  child ! GetRoutees

  def receive = {
    case RouterRoutees(routees) =>
      routees foreach context.watch

    case "start" =>
      replyTo = sender()
      child ! "error"

    case Terminated(actor) =>
      replyTo ! -1
      context.stop(self)
  }
}

class Child extends Actor with ActorLogging {

  override def preRestart(reason: Throwable, message: Option[Any]): Unit = {
    log.info("***** RESTARTING *****")
    message foreach{ self forward }
  }

  def receive = LoggingReceive {
    case "error" =>
      log.info("***** GOT ERROR *****")
      throw new NullPointerException
  }
}

object Boot extends App {

  val system = ActorSystem()
  val supervisor = system.actorOf(Props[Supervisor], "supervisor")

  supervisor ! "start"

}

Am I doing anything wrong to accomplish that?

EDIT

Actually, I misunderstood the purpose of the withinTimeRange. To schedule my retries in a time range, I'm doing the following:

override def preRestart(reason: Throwable, message: Option[Any]): Unit = {
    log.info("***** RESTARTING *****")
    message foreach { msg =>
      context.system.scheduler.scheduleOnce(30.seconds, self, msg)
    }
  }

It seems to work ok.


Solution

  • I think you have misunderstood the purpose of the withinTimeRange arg. That value is supposed to be used in conjunction with maxNrOfRetries to provide a window in which to support the limiting of the number of retries. For example, as you have specified, the implication is that the supervisor will no longer restart an individual child if that child needs to be restarted more than 3 times in 10 seconds.