I'm trying to create a child process for each connection and the problem is, it won't return until the the child process exited.
-module(nodesupervisor).
-export([start_link/0, init/1, start_child/2]).
start_link() ->
supervisor:start_link({local, ?MODULE}, ?MODULE, []).
init([]) ->
{ok, {{one_for_one, 5, 10},[]}}.
start_child(_sup, Socket) -> % pass the returned value of start_link and user's socket
ChildSpec = {nodemod, {nodemod, start_link_node, [Socket]},
permanent, 5000, worker, [nodemod]},
io:fwrite("supervisor : ~p~n", [supervisor:start_child(_sup, ChildSpec)]). % this wont return until the process finished or failed.
The function start_child
as far as I know should return immediately but it only return if the new child exited.
How do I fix it?
I have tried different implementations and all of them returned only when child process exited.
First of all, your supervisor does not have this line:
-behavior(supervisor).
...which is required.
the function start_child as far as i know should return immediately but it only return if the new child exited.
If your child is a gen_server
, the gen_server docs say:
The gen_server process calls Module:init/1 to initialize. To ensure a synchronized startup procedure, start_link/3,4 does not return until Module:init/1 has returned.
genserver:start_link()
returns the pid of the gen_server:
{ok, Pid :: pid()} |
ignore |
{error, Reason :: term()}
and supervisor:start_child()
returns
{ok, Child :: child()}
where child()
is the pid of the gen_server. So it seems logical that supervisor:start_child()
cannot return until gen_server:start_link()
returns, and gen_server:start_link()
can't return until nodemod:init()
returns. I put a timer:sleep(10000)
in my child's init()
function, and supervisor:start_child()
hung for 10 seconds. Is your nodemod a gen_server? Are you doing something in nodemod:init()
that hangs?
Here are some other relevant passages from the docs describing dynamic supervisors:
A supervisor can have one of the following restart strategies...:
...
simple_one_for_one - A simplified one_for_one supervisor, where all child processes are dynamically added instances of the same process type, that is, running the same code.
https://www.erlang.org/doc/man/supervisor.html#supervision-principles
...
Notice that when the restart strategy is simple_one_for_one, the list of child specifications must be a list with one child specification only. (The child specification identifier is ignored.) No child process is then started during the initialization phase, but all children are assumed to be started dynamically using start_child/2.
https://www.erlang.org/doc/man/supervisor.html#Module:init-1
...
When started, the supervisor does not start any child processes. Instead, all child processes are added dynamically by calling:
supervisor:start_child(Sup, List)
Sup is the pid, or name, of the supervisor.
List is an arbitrary list of terms, which are added to the list of arguments specified in the child specification.
If the start function is specified as {M, F, A}, the child process is started by calling apply(M, F, A++List).https://www.erlang.org/doc/design_principles/sup_princ.html#simplified-one_for_one-supervisors
Here's an example of a simple_one_for_one
dynamic supervisor:
nodes_supervisor.erl:
-module(nodes_supervisor).
-behavior(supervisor).
-export([start_link/0, init/1]).
start_link() ->
supervisor:start_link(
{local, ?MODULE}, % Name to register for this process.
?MODULE, % Module containing callback function init().
[] % Args for init().
).
init([]) ->
SupFlags = #{strategy => simple_one_for_one, intensity => 5, period => 10},
ChildSpecs = [#{id => node,
start => {node, start_link_node, [5]},
% The line above says that start_link_node() will have arity 1
% (one element in the list of args), but supervisor:start_child()
% will add args to [5], using [5] ++ list2, where list2 comes
% from the second arg to supervisor:start_child().
% So the arity of start_link_node() needs to match the number of
% args in [5] ++ list2.
restart => permanent,
shutdown => 5000,
type => worker,
modules => [node]}],
{ok, {SupFlags, ChildSpecs}}.
node.erl:
-module(node).
-behavior(gen_server).
-export([init/1, handle_cast/2, handle_call/3]).
-export([start_link_node/3]).
start_link_node(Arg1, Socket, Arg3) -> % The arity has to match the number of args
% specified in the child_spec list plus the number
% of args supplied in the list which is the second
% argument in the call to supervisor:start_child()
gen_server:start_link(
?MODULE, % Module containing the callback functions: init(), handle_cast(), etc.
[Socket, Arg1, Arg3], % Args sent to init().
[] % Options for starting gen_server.
).
init([Socket|_Rest]) ->
% Or, create the socket here.
io:format("Starting node child process: ~w~n", [self()]),
NumberOfTimesDoSomethingWasCalled = 0,
{ok, {Socket, NumberOfTimesDoSomethingWasCalled}}. % 2nd element of outer tuple will be
% the initial state of the gen_server.
handle_cast(do_something, State) ->
{Socket, CallCount} = State,
io:format("do_something with Socket: ~w~n", [Socket]),
NewCallCount = CallCount+1,
io:format("CallCount is: ~w~n", [NewCallCount]),
NewState = {Socket, NewCallCount},
{noreply, NewState};
handle_cast(something_else, State) ->
io:format("Doing something else."),
{noreply, State}.
handle_call(task1, _From, State) ->
Msg = {hello, 10},
{reply, Msg, State}.
In the shell:
~/erlang_programs/supervisor_my% erl
Erlang/OTP 24 [erts-12.3.2] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1]
Eshell V12.3.2 (abort with ^G)
1> c(node).
{ok,node}
2> c(node_supervisor).
{ok,node_supervisor}
3> {ok, Sup} = node_supervisor:start_link().
{ok,<0.95.0>}
4> Socket = a_socket.
a_socket
5> {ok, Node1} = supervisor:start_child(Sup, [Socket, 10]).
Starting node child process: <0.98.0>
{ok,<0.98.0>}
6> gen_server:cast(Node1, do_something).
do_something with Socket: a_socket
ok
CallCount is: 1
7> gen_server:cast(Node1, do_something).
do_something with Socket: a_socket
ok
CallCount is: 2
8> Socket2 = b_socket.
b_socket
9> {ok, Node2} = supervisor:start_child(Sup, [Socket2, 30]).
Starting node child process: <0.103.0>
{ok,<0.103.0>}
10> gen_server:cast(Node2, do_something).
do_something with Socket: b_socket
ok
CallCount is: 1