I have 2 nodes,
let's call them A and B.
B has a GenServer module that, when started, monitor A. (This module only exists on B) this GenServer is started by A when A connects to B.
If A dies when connected to B, B should kill itself using :init.stop()
Here's the code of the GenServer:
defmodule Monitor do
use GenServer
def start_link() do
GenServer.start_link(__MODULE__, [], [])
end
def init([]) do
{:ok, %{}, 0}
end
def handle_info(:timeout, s) do
start(:"A@127.0.0.1")
{:noreply, s}
end
def handle_info({:nodedown, node}, state) do
s_node = node |> to_string
case s_node do
"A" <> _ ->
IO.puts "A is down, killing myself !"
:init.stop()
_ ->
:ko
end
{:noreply, state}
end
def handle_info(_, s) do
{:noreply, s}
end
def start(node) do
res = Node.monitor(node, true)
IO.puts "Starting to monitor: #{inspect node}"
end
end
I start both node, A and B. I connect A to B. I start the Monitor using this command in A :
> :rpc.call(:"B@127.0.0.1", Monitor, :start_link, [])
{:ok, #PID<8440.594.0>}
If I disconnect gracefully A from B with Node.disconnect
, everything works as expected, B is detecting the node A down and is killing itself.
However, if I kill the console of A with Ctrl-C Ctrl-C
, or even Ctrl-g / q
The GenServer on B with pid <0.594.0>
doesn't exist anymore and therefore can't detect A being down.
Is the Pid "linked" to A for some reason ?
PS
I tried with Node.spawn
and calling spawn in the :rpc.call, I get the same result
PS 2
If I start the GenServer from the B console, when killing A with either Node.disconnect
or Ctrl-C Ctrl-C
, it works as expected...
PS 3 I thought that it might come from the fact that I call :start_link, but I have the same behaviour with :start (without link)
subsidiary question
Answered by Hynek -Pichi- Vychodil
in the comments...
Why all the IO.puts are printed on A while the GenServer is technically running on B ?
When you call :rpc.call(:"B@127.0.0.1", Monitor, :start_link, [])
you link your Monitor
process to the current shell (and also set group_leader
to the :user
process at A
which basically prevents you from seeing output IO.puts "A is down, killing myself !"
once A is down.) So it means when A dies the shell running at A die as well and your Monitor
as well because linked.
So first of all, you should use GenServer.start()
for starting by :rpc.call()
or set process flag :trap_exit
to true or put it in the supervisor tree of an app running at node B. The second, set group_leader
to user
process at node B.
P.S.: I'm not familiar with Elixir syntax so I do not provide code.