Search code examples
erlanggen-server

Erlang: starting gen_server on another node fails after init


I am stuck in a bit of a fix trying to run gen_server on another node. So I have a common gen_server class which looks like this

start(FileName) ->
  start_link(node(), FileName).

start_link(ThisNode, FileName) ->
  gen_server:start_link({local, ?MODULE}, ?MODULE, [ThisNode, FileName], []).

init([ThisNode, FileName]) ->
  process_flag(trap_exit, true),
  {ok, Terms} = file:consult(FileName),
  {A1, B1, C1} = lists:nth(1,Terms),
  place_objects(A1, B1, C1).

Now I want to start multiple nodes that will run the same gen_server and somehow communicate with each other, and use a another node to orchestrate that. (All these nodes are started on my local terminal).

So I start a new node in one terminal using erl -sname bar where I intend to run the gen_server, and compile the gen_server module on this node. Then I start another node called 'sup' which I intend to use as a coordinator for all the other nodes. If I run the command my_gen_server:start("config_bar.txt"). on bar, it successfully returns but when I run the command rpc:call('bar@My-MacBook-Pro', my_gen_server, start, ["config_bar.txt"]). on sup, it successfully returns from the init method (I checked this by putting in the logs) but immediately after that, I get this error:

{ok,<9098.166.0>}
(sup@My-MacBook-Pro)2> =ERROR REPORT==== 21-Feb-2022::11:12:30.443051 ===
** Generic server my_gen_server terminating 
** Last message in was {'EXIT',<9098.165.0>,
                               {#Ref<0.3564861827.2990800899.137513>,return,
                                {ok,<9098.166.0>}}}
** When Server state == {10,10,#Ref<9098.1313723616.3973185546.82660>,
                         'bar@My-MacBook-Pro'}
** Reason for termination ==
** {#Ref<0.3564861827.2990800899.137513>,return,{ok,<9098.166.0>}}

=CRASH REPORT==== 21-Feb-2022::11:12:30.443074 ===
  crasher:
    initial call: my_gen_server:init/1
    pid: <9098.166.0>
    registered_name: my_gen_server
    exception exit: {#Ref<0.3564861827.2990800899.137513>,return,
                     {ok,<9098.166.0>}}
      in function  gen_server:decode_msg/9 (gen_server.erl, line 481)
    ancestors: [<9098.165.0>]
    message_queue_len: 0
    messages: []
    links: []
    dictionary: []
    trap_exit: true
    status: running
    heap_size: 1598
    stack_size: 29
    reductions: 3483
  neighbours:

I can't seem to figure out what causes the error and if there's anything I need to add to my gen_server code to fix it. Would really appreciate some help on this one!


Solution

  • The gen_server in the remote node is linked to an ephemeral process created for the rpc call. As this ephemeral process exits with a term that's different from normal (the actual result of the rpc call), the exit signal propagates to the gen_server, killing it.

    You can use gen_server:start instead of gen_server:start_link or, if you want the gen_server to be part of the supervission tree, instruct its supervisor to spawn it:

    rpc:call('bar@My-MacBook-Pro',  my_gen_sup, start_child, ["config_bar.txt"]).