I am creating multiple GenServers
gossiping by sending message to each others. I have set an exit condition to make every process die once it has received 10 messages. Each GenServer
is created at the beginning of the gossip in the launch
function.
defmodule Gossip do
use GenServer
# starting gossip
def start_link(watcher \\ nil), do: GenServer.start_link(__MODULE__, watcher)
def init(watcher), do: {:ok, {[],0,watcher}}
def launch(n, watcher \\ nil) do
crew = (for _ <- 0..n, do: elem(Gossip.start_link(watcher),1))
Enum.map(crew, &(add_crew(&1,crew--[&1])))
crew
|> hd()
|> Gossip.send_msg()
end
# client side
def add_crew(pid, crew), do: GenServer.cast(pid, {:add_crew, crew})
def rcv_msg(pid, msg \\ ""), do: GenServer.cast(pid, {:rcv_msg, msg})
def send_msg(pid, msg \\ ""), do: GenServer.cast(pid, {:send_msg, msg})
# server side
def handle_cast({:add_crew, crew}, {_, msg_counter, watcher}), do:
{:noreply, {crew, msg_counter, watcher}}
def handle_cast({:rcv_msg, _msg}, {crew, msg_counter, watcher}) do
if msg_counter < 10 do
send_msg(self())
else
GossipWatcher.increase(watcher)
IO.inspect(self(), label: "exit of:") |> Process.exit(:normal)
end
{:noreply, {crew, msg_counter+1, watcher}}
end
def handle_cast({:send_msg,_},{[],_,_}), do: Process.exit(self(),"crew empty")
def handle_cast({:send_msg, _msg}, {crew, msg_counter, watcher}=state) do
rcpt = Enum.random(crew) ## recipient of the msg
if Process.alive?(rcpt) do
IO.inspect({self(),rcpt}, label: "send message from/to")
rcv_msg(rcpt, "ChitChat")
send_msg(self())
{:noreply, state}
else
IO.inspect(rcpt, label: "recipient is dead:")
{:noreply, {crew -- [rcpt], msg_counter, watcher}}
end
end
end
defmodule GossipWatcher do
use GenServer
def start_link(opt \\ []), do: GenServer.start_link(__MODULE__, opt)
def init(opt), do: {:ok, {0}}
def increase(pid), do: GenServer.cast(pid, {:increase})
def handle_cast({:increase}, {counter}), do:
IO.inspect({:noreply, {counter+1}}, label: "toll of dead")
end
I use the module GossipWatcher
to monitor that number of GenServer
who dies, after having received 10 messages. The issue is that the iex
prompt back whereas there are still some GenServers
alive. For example over 1000 GenServer
, only ~964 GenServers
die at the end of the gossip.
iex(15)> {:ok, watcher} = GossipWatcher.start_link
{:ok, #PID<0.11163.0>}
iex(16)> Gossip.launch 100, watcher
send message from/to: {#PID<0.11165.0>, #PID<0.11246.0>}
:ok
send message from/to: {#PID<0.11165.0>, #PID<0.11167.0>}
send message from/to: {#PID<0.11246.0>, #PID<0.11182.0>}
send message from/to: {#PID<0.11165.0>, #PID<0.11217.0>}
...
toll of dead: {:noreply, {960}}
toll of dead: {:noreply, {961}}
toll of dead: {:noreply, {962}}
toll of dead: {:noreply, {963}}
toll of dead: {:noreply, {964}}
iex(17)>
Am I missing something here ? Is the process timing out ? Any help would be appreciated
TIA.
The part of your code that can play some tricks is here:
def handle_cast({:send_periodic_message}, zero_counter_gossip_true) do
...
if (Process.alive?(rcpt)) == true do
...
else
IO.inspect(rcpt, label: "recipient is dead:")
{:noreply, {crew -- [rcpt], msg_counter, watcher}}
end
end
In this part of the else, you allow the GenServer
to stop working: since it does not send a message to a neighbor or himself, no "action" are launched and it simply stop doing something.
In the worst and unlikely case possible: if you start 2000 GenServer
and launch the gossip from one GenServer
, and that this first one only talks to a second one which also only talk to the first one.... then only one GenServer
is going to die, and you get back the command prompt, with still 1999 GenServer
alive but doing nothing (since they are receiving 0 messages).
Even if this case is far fetched, it shows that the execution of the gossip can end prematurely before every GenServer
has received 10 messages. Hence the behavior you describe.
I did some test, rewriting your code, and using a second type of GenServer
to monitor how many GenServers
are killed, and how many survive. It turns out that out of 1000 GenServers
, I get an average of 40 GenServer
still alive after I got back the iex
prompt.