Hi I want to implement distributed caches as an exercise. The cache module is based on gen_server. The caches are started by an CacheSupervisor module. At first I tried running it all on one node, which worked well. Now I am trying to distribute my caches on two nodes, which live in two open console windows on my laptop.
PS:
While writing this question I realised that I forgot to connect my third window to the other nodes. I fixed it, but I am still having the original error.
Consoles:
After connecting my nodes I callCacheSupervisor.start_link()
in my third window, this results in the follwing error message.
Error:
** (EXIT from #PID<0.112.0>) shutdown: failed to start child: :de
** (EXIT) an exception was raised:
** (ArgumentError) argument error
erlang.erl:2619: :erlang.spawn(:node1@DELL_XPS, {:ok, #PID<0.128.0>})
(stdlib) supervisor.erl:365: :supervisor.do_start_child/2
(stdlib) supervisor.erl:348: :supervisor.start_children/3
(stdlib) supervisor.erl:314: :supervisor.init_children/2
(stdlib) gen_server.erl:328: :gen_server.init_it/6
(stdlib) proc_lib.erl:247: :proc_lib.init_p_do_apply/3
I am guessing that the error indicates that the :gen_server.start_link(..)
inside start_link(name)
of my Cache Module resolves to {:ok, #PID<0.128.0>}
which seems to be incorrect, but I am having no Idea where to put the Node.spawn()
else
Module Cache:
defmodule Cache do
use GenServer
def handle_cast({:put, url, page}, {pages, size}) do
new_pages = Dict.put(pages, url, page)
new_size = size + byte_size(page)
{:noreply, {new_pages, new_size}}
end
def handle_call({:get, url}, _from, {pages, size}) do
{:reply, pages[url], {pages, size}}
end
def handle_call({:size}, _from, {pages, size}) do
{:reply, size, {pages, size}}
end
def start_link(name) do
IO.puts(elem(name,0))
Node.spawn(String.to_atom(elem(name, 0)), :gen_server.start_link({:local,elem(name, 1)}, __MODULE__, {HashDict.new, 0}, []))
end
def put(name, url, page) do
:gen_server.cast(name, {:put, url, page})
end
def get(name, url) do
:gen_server.call(name, {:get, url})
end
def size(name) do
:gen_server.call(name, {:size})
end
end
Module CacheSupervisor:
defmodule CacheSupervisor do
use Supervisor
def init(_args) do
workers = Enum.map( [{"node1@DELL_XPS", :de},{"node1@DELL_XPS", :edu}, {"node2@DELL_XPS", :com} ,{"node2@DELL_XPS", :it}, {"node2@DELL_XPS", :rest}],
fn(n)-> worker(Cache, [n], id: elem(n, 1)) end)
supervise(workers, strategy: :one_for_one)
end
def start_link() do
:supervisor.start_link(__MODULE__, [])
end
end
:erlang.spawn(:node1@DELL_XPS, {:ok, #PID<0.128.0>})
:erlang.spawn/2
is the same function as Node.spawn/2
. The function expects node name (which you have provided) and a function. Your GenServer.start_link call returned {:ok, Pid} as it should. Since a tuple can't be treated like a function Node.spawn/2
crashes.
I would not recommend spawning processes on separate nodes like this. If remote node goes down, not only will you lose that node in your cluster, but you will also have to deal with the fallout from all your spawned processes. This will result an app that is more brittle than it would otherwise be. If you want to have your cache GenServers running on multiple nodes I'd suggest running the application you are building on multiple nodes, and having an instance of your CacheSupervisor
on each node. Then each CacheSupervisor
starts up it's own GenServers underneath it. This is more robust because if a node goes down the remaining nodes will be unaffected. Of course you application logic will need to take this into account, losing a node could mean losing cache data or client connections. See this answer for more details: How does an Erlang gen_server start_link a gen_server on another node?
If you really really want to a spawn process on a remote node like this you could do this:
:erlang.spawn_link(:node1@DELL_XPS, fun() ->
{:ok, #PID<0.128.0>} = :gen_server.start_link({:local,elem(name, 1)}, __MODULE__, {HashDict.new, 0}, [])
receive
% Block forever
:exit -> :ok
end
end)
Note that you must use spawn_link, as supervisors expect to be linked to their children. If the supervisor is not linked it will not know when the child crashes and won't be able to restart the process.