Search code examples
erlangmnesia

erlang node not responding


I received such message in erlang condose at first@localhost node

=ERROR REPORT==== 1-Jan-2011::23:19:28 ===
** Node 'second@localhost' not responding **
** Removing (timedout) connection **

My question is - what is timeout in this case? How much time before causes this event? Howto prevent this "horror"? I can restore\recover to normal work only by restart node... But what is the right way?

Thank you, and Happy New Year!


Solution

  • Grepping for the not responding string in the Erlang source code, you can see how the message is generated in the dist_util module in the kernel application (con_loop function).

        {error, not_responding} ->
            error_msg("** Node ~p not responding **~n"
                  "** Removing (timedout) connection **~n",
                  [Node]),
    

    Within the module, the following documentation is present, explaining the logic behind ticks and not responding nodes:

    %%
    %% Send a TICK to the other side.
    %%
    %% This will happen every 15 seconds (by default) 
    %% The idea here is that every 15 secs, we write a little 
    %% something on the connection if we haven't written anything for 
    %% the last 15 secs.
    %% This will ensure that nodes that are not responding due to 
    %% hardware errors (Or being suspended by means of ^Z) will 
    %% be considered to be down. If we do not want to have this  
    %% we must start the net_kernel (in erlang) without its 
    %% ticker process, In that case this code will never run 
    
    %% And then every 60 seconds we also check the connection and 
    %% close it if we havn't received anything on it for the 
    %% last 60 secs. If ticked == tick we havn't received anything 
    %% on the connection the last 60 secs. 
    
    %% The detection time interval is thus, by default, 45s < DT < 75s 
    
    %% A HIDDEN node is always (if not a pending write) ticked if 
    %% we haven't read anything as a hidden node only ticks when it receives 
    %% a TICK !! 
    

    Hope this helps a bit.