Search code examples
erlangelixirgen-server

BEAM behaviour RPC call between 2 nodes


I have 2 nodes,

let's call them A and B.

B has a GenServer module that, when started, monitor A. (This module only exists on B) this GenServer is started by A when A connects to B.

If A dies when connected to B, B should kill itself using :init.stop()

Here's the code of the GenServer:

defmodule Monitor do
  use GenServer

  def start_link() do
    GenServer.start_link(__MODULE__, [], [])
  end

  def init([]) do
    {:ok, %{}, 0}
  end

  def handle_info(:timeout, s) do
    start(:"A@127.0.0.1")
    {:noreply, s}
  end
  def handle_info({:nodedown, node}, state) do
    s_node = node |> to_string
    case s_node do
      "A" <> _ ->
        IO.puts "A is down, killing myself !"
        :init.stop()
      _ ->
        :ko
    end
    {:noreply, state}
  end
  def handle_info(_, s) do
    {:noreply, s}
  end

  def start(node) do
    res = Node.monitor(node, true)
    IO.puts "Starting to monitor: #{inspect node}"
  end
end

I start both node, A and B. I connect A to B. I start the Monitor using this command in A :

> :rpc.call(:"B@127.0.0.1", Monitor, :start_link, [])
{:ok, #PID<8440.594.0>}

If I disconnect gracefully A from B with Node.disconnect, everything works as expected, B is detecting the node A down and is killing itself.

However, if I kill the console of A with Ctrl-C Ctrl-C, or even Ctrl-g / q The GenServer on B with pid <0.594.0> doesn't exist anymore and therefore can't detect A being down. Is the Pid "linked" to A for some reason ?

PS I tried with Node.spawn and calling spawn in the :rpc.call, I get the same result

PS 2 If I start the GenServer from the B console, when killing A with either Node.disconnect or Ctrl-C Ctrl-C, it works as expected...

PS 3 I thought that it might come from the fact that I call :start_link, but I have the same behaviour with :start (without link)

subsidiary question Answered by Hynek -Pichi- Vychodil in the comments...

Why all the IO.puts are printed on A while the GenServer is technically running on B ?


Solution

  • When you call :rpc.call(:"B@127.0.0.1", Monitor, :start_link, []) you link your Monitor process to the current shell (and also set group_leader to the :user process at A which basically prevents you from seeing output IO.puts "A is down, killing myself !" once A is down.) So it means when A dies the shell running at A die as well and your Monitor as well because linked. So first of all, you should use GenServer.start() for starting by :rpc.call() or set process flag :trap_exit to true or put it in the supervisor tree of an app running at node B. The second, set group_leader to user process at node B.

    P.S.: I'm not familiar with Elixir syntax so I do not provide code.