Search code examples
erlangerlang-otperlang-supervisor

How can a supervisor that reached_max_restart_intensity only delete the offending child?


I have a one_for_one supervisor that handles similar and totally independent children.

When there is a problem with one child, repeatedly crashing and triggering:

=SUPERVISOR REPORT==== 30-Mar-2011::13:10:42 ===
     Supervisor: {local,gateway_sup}
     Context:    shutdown
     Reason:     reached_max_restart_intensity
     Offender:   [{pid,<0.76.0>}, ...

shutting itself down and also terminating all the innocent children that would just continue to run fine otherwise.

How can I build a supervision tree out of standard Erlang supervisors that only stops to restart the one offending child and leaves the others alone?

I was thinking about having a extra supervisor with just one single child but this seems to heavyweight to me.

Any other ways to handle this?


Solution

  • I think the best solution would be to have two layers of supervision.

    One supervisor which starts a supervisor + process pair for each gen_server you want running. This supervisor is configured with one_for_one strategy and temporary children.

    Each supervisor running under this supervisor would have correctly configured MaxR and MaxT values, that will trigger a crash of that supervisor once the child misbehaves.

    When the lower level supervisor crashes, the top level supervisor "just doesn't care".

    A supervisor consumes 233 bytes when started with one child (total heap size) so memory consumption should not be an issue.

    The supervision tree should look like:

    supervisor_top
        |
        |
        +------------------------+-----    ...
        |                        |
     supervisor_1               supervisor_2
     restart temporary          restart temporary
        |                         |
      gen_server_1              gen_server_2
      restart transient         restart transient