Search code examples
erlangelixirerlang-otp

Elixir - Erlang: is there a "reasonable" limit of children handled by a supervisor?


I'm working with Elixir but I believe this question applies to Erlang as well.

I'm working on a system which might create dozens of thousands of groups processes of same kind. Each group will have 2 workers and a local supervisor of its own. The question is who will supervise the local supervisors ?

I can imagine two strategies

  1. one big supervisor that will handle all local supervisors. This method is simple yet I believe the supervisor will need to traverse its huge list of children whenever something happens to a child which will be a heavy operation.
  2. a partitioned tree. Say for example a set of intermediate supervisors supervising about 1000 local supervisors, then a global supervisor handling the intermediate ones. To create a new group, the global supervisor will need to find the intermediate supervisor with least children and delegate to that one the creation.

Does either make sense or is there any other way? Any advice is welcome


Solution

  • "It depends".

    "huge list" and "thousands" really are in different realms. Simple iteration is fast on modern machines. Up to high five, low six items I would have no qualms with a system that regularly has to traverse a list this size, and probably over that I wouldn't really care either:

    iex(2)> list = Enum.to_list 1..1_000_000; :timer.tc(fn -> Enum.sum list end) {24497, 500000500000}

    (that is 25 ms for the list traversal and some arithmetic - I'm usually happy if a crashed process gets restarted with such small delays)

    Of course - at the end of the day you're expected to do your own performance testing, compare the outcomes with the expected local supervisor crash rate, look up your system's requirements, and compare all these figures to come to an answer.

    In the meantime, use the simplest thing that can possibly work: a single global supervisor monitoring a flat hierarchy.