Search code examples
erlangelixirgen-server

How to distribute supervised gen_server workers?


Hi I want to implement distributed caches as an exercise. The cache module is based on gen_server. The caches are started by an CacheSupervisor module. At first I tried running it all on one node, which worked well. Now I am trying to distribute my caches on two nodes, which live in two open console windows on my laptop.

PS:

While writing this question I realised that I forgot to connect my third window to the other nodes. I fixed it, but I am still having the original error.

Consoles:

Node consoles

After connecting my nodes I callCacheSupervisor.start_link() in my third window, this results in the follwing error message.

Error:

** (EXIT from #PID<0.112.0>) shutdown: failed to start child: :de
    ** (EXIT) an exception was raised:
        ** (ArgumentError) argument error
            erlang.erl:2619: :erlang.spawn(:node1@DELL_XPS, {:ok, #PID<0.128.0>})
            (stdlib) supervisor.erl:365: :supervisor.do_start_child/2
            (stdlib) supervisor.erl:348: :supervisor.start_children/3
            (stdlib) supervisor.erl:314: :supervisor.init_children/2
            (stdlib) gen_server.erl:328: :gen_server.init_it/6
            (stdlib) proc_lib.erl:247: :proc_lib.init_p_do_apply/3

I am guessing that the error indicates that the :gen_server.start_link(..) inside start_link(name) of my Cache Module resolves to {:ok, #PID<0.128.0>} which seems to be incorrect, but I am having no Idea where to put the Node.spawn() else

Module Cache:

defmodule Cache do
 use GenServer
 def handle_cast({:put, url, page}, {pages, size}) do
    new_pages = Dict.put(pages, url, page)
    new_size = size + byte_size(page)
    {:noreply, {new_pages, new_size}}
 end

 def handle_call({:get, url}, _from, {pages, size}) do
    {:reply, pages[url], {pages, size}}
 end

 def handle_call({:size}, _from, {pages, size}) do
    {:reply, size, {pages, size}}
 end
 def start_link(name) do
    IO.puts(elem(name,0))
    Node.spawn(String.to_atom(elem(name, 0)), :gen_server.start_link({:local,elem(name, 1)},  __MODULE__, {HashDict.new, 0}, []))
 end

 def put(name, url, page) do
    :gen_server.cast(name, {:put, url, page})
 end

 def get(name, url) do
    :gen_server.call(name, {:get, url})
 end

 def size(name) do
    :gen_server.call(name, {:size})
 end

end

Module CacheSupervisor:

defmodule CacheSupervisor do
  use Supervisor
 def init(_args) do 

    workers = Enum.map( [{"node1@DELL_XPS", :de},{"node1@DELL_XPS", :edu}, {"node2@DELL_XPS", :com} ,{"node2@DELL_XPS", :it}, {"node2@DELL_XPS", :rest}],
    fn(n)-> worker(Cache, [n],  id: elem(n, 1)) end)
    supervise(workers, strategy: :one_for_one)
 end

 def start_link() do
    :supervisor.start_link(__MODULE__, [])
 end

end

Solution

  • :erlang.spawn(:node1@DELL_XPS, {:ok, #PID<0.128.0>})
    

    :erlang.spawn/2 is the same function as Node.spawn/2. The function expects node name (which you have provided) and a function. Your GenServer.start_link call returned {:ok, Pid} as it should. Since a tuple can't be treated like a function Node.spawn/2 crashes.

    I would not recommend spawning processes on separate nodes like this. If remote node goes down, not only will you lose that node in your cluster, but you will also have to deal with the fallout from all your spawned processes. This will result an app that is more brittle than it would otherwise be. If you want to have your cache GenServers running on multiple nodes I'd suggest running the application you are building on multiple nodes, and having an instance of your CacheSupervisor on each node. Then each CacheSupervisor starts up it's own GenServers underneath it. This is more robust because if a node goes down the remaining nodes will be unaffected. Of course you application logic will need to take this into account, losing a node could mean losing cache data or client connections. See this answer for more details: How does an Erlang gen_server start_link a gen_server on another node?

    If you really really want to a spawn process on a remote node like this you could do this:

    :erlang.spawn_link(:node1@DELL_XPS, fun() -> 
       {:ok, #PID<0.128.0>} = :gen_server.start_link({:local,elem(name, 1)}, __MODULE__, {HashDict.new, 0}, [])
       receive
         % Block forever
         :exit -> :ok
       end
    end)
    

    Note that you must use spawn_link, as supervisors expect to be linked to their children. If the supervisor is not linked it will not know when the child crashes and won't be able to restart the process.