Erlang gen_tcp accept vs OS-Thread accept

I have two models of listening sockets and acceptors in Erlang:

------------FIRST------------

-module(listeners).
....

start() ->
{ok, Listen}=gen_tcp:listen(....),
accept(Listen).

%%%%%%%%%%%%%%%%%%%%%

accept(Listen) ->
{ok, Socket}=gen_tcp:accept(Listen),
spawn(fun() ->handle(Socket) end),
accept(Listen).

%%%%%%%%%%%%%%%%%%%%%

handle(Socket) ->
....

---------SECOND----------

-module(listener).
....

start() ->
supervisor:start_link({local,?MODULE},?MODULE, []). 

%%%%%%%%%%%%%

init([]) ->
{ok, Listen}=gen_tcp:listen(....),
spawn(fun() ->free_acceptors(5) end), 
{ok, {{simple_one_for_one, 5,1},[{child,{?MODULE,accept,[Listen]},....}]}.

%%%%%%%%%%%%%

free_acceptors(N) ->
[supervisor:start_child(?MODULE, []) || _ <-lists:seq(1,N)],
ok.

%%%%%%%%%%%%%

accept(Listen) ->
{ok, Socket}=gen_tcp:accept(Listen). 
handle(Socket). 

%%%%%%%%%%%%%%

handle(Socket) ->
....

The first code is simple, the main process creates a listen socket and listen to accept new connections, when a connection came, it accept the connection spawn a new process to handle it and returns to accept other new connections.

The second code is also simple, the main process creates a supervision tree, the supervisor creates a listen socket and start 5 childs (spawning a new process to run free_acceptors/1 because this function calls the supervisor process and the supervisor is in it's init function and it can't start childs before it's own start so the new process will wait the supervisor until it finishes it's initiation) and give the listen socket as an argument to it's childs, and the five children start listen to accept new coming connections at the SAME time.

So we run the two codes each in a separate machine that have a CPU with a single core and 5 clients try to connect at the same time to the first server and other 5 to the second server: from the first look, i thinked that the second server is faster because all connections will be accepted in parallel and at the same time and in the first code the fivest client will wait the server to accept the precedents four to accept it and so on. but going deeply at the ERTS, we have a single OS-Thread per core to handle erlang processes and since a Socket is an OS structure then gen_tcp:listen will call OS-Thread:listen (this is just pseudo code to understand) to create an OS Socket and gen_tcp:accept calls OS-Thread:accept to accept new connection and this later can accept just one connection at a time and the fivest client still wait the server to accept the fourth precedents, so is there difference between the two codes ? i hope that you understand me.

Even if the code doesn't include sockets the Erlang processes will be always concurrent and not parallel because there is just one core, but the Sheduler will manage tasks between processes very fast and close to parallel run, so the problem is in the use of sockets that use OS calls across the single OS-Thread.

NOTE : Ejabberd use the first implementation and Cowboy use the second.

Solution

At OS level, a listen socket has associated a queue of OS-threads waiting to accept connections, regardless of whether this queue has any OS-thread blocked on it or is empty because it will be handled differently (busy-waiting non-blocking accept, select, epoll...).

The BEAM does not have a single OS-thread even if you run it on a system with a single CPU, it has different types of OS-threads

Regarding your question I suspect that it will be, if anything, better to have multiple acceptor erlang-threads continuously blocking on the gen_tcp:accept call because that way the ERTS has knowledge about the erlang code willing to accept more connections (the handle(Socket) in your second example should spawn a worker or send the accepted socket to a worker and get back to accept connections) while with the single accept-spawn loop this knowledge is hidden.

I'm not familiar enough with the code to know the nuances, but it seems that the code handles multiple accepts nicely, queueing them internally, so it might be marginally better to have multiple acceptors.
I.e. In the first example with a single request there is a moment where there is nobody accepting connections, while you need a higher number of simultaneous request in the second example for this to happen.