Handling result of start_link/3 when using Supervisor

I have a supervisor set up to supervise a Slack websocket:

children = [
  %{
    id: Slack.Bot,
    start: {Slack.Bot, :start_link, [MyBot, [], "api_token"]}
  }
]
opts = [strategy: :one_for_one, name: MyBot.Supervisor]
Supervisor.start_link(children, opts)

MyBot receives various callbacks when messages arive via the websocket. This is fine, but there is an additional callback, handle_info/3, that I want to use to handle my own events. In order to do this I need to send a message to the process myself.

I see I can get the PID from the result of start_link/3, but this is called automatically by the Supervisor. How can I get the PID of this process in order to send it a message, while still keeping it supervised? Do I have to implement and extra supervision layer?

Solution

Supervisors, pids and start functions

Supervisors expect start functions to return one of these three values:

{:ok, pid}
{:ok, pid, any}
{:error, any}

In your code, the start function is Slack.Bot.start_link/4 with last arguments being the empty list by default.

You notice you cannot access the pid because the results of the start functions are lost through the usage of Elixir's Supervisor.start_link/2. In some cases, it makes sense to invoke Supervisor.start_child/2 instead, which returns the pid of the started child (and additional info, if any). And for completeness, the pids of supervised processes can also be queried with Supervisor.which_children/1.

However, supervisors' role is to supervise processes and restart them when necessary. When a process is restarted, it gets a new pid. For this reason, the pid is not the proper way to refer to a process for a long duration.

Pids and names

The solution to your problem is to refer to the process by name. The virtual machine maintains a mapping of names of processes (as well as ports) and allows to refer to processes (and ports) by name instead of pids (and port references). The primitive to register a process is Process.register/2. Most functions, if not all, that expect a pid also accept a registered name. Names are unique within a node.

While spawn* primitives do not register processes by names, code built on top of them often provides the ability to register names through the start procedure. This is the case of Slack.Bot.start_link/4 as well as Supervisor.start_link/2. Typically, this is what your code does by passing a :name option to Supervisor.start_link/2. BTW, this is useless unless you need to refer to the Supervisor process later on, which is probably not the case as hinted by several bits of your code.

The case of `Slack.Bot.start_link/4`

To be able to refer to your bot process, simply make sure that Slack.Bot.start_link/4 is invoked with a :name option with a name of your choice (an atom), for example MyBot. This is done within the child specification.

children = [
  %{
    id: Slack.Bot,
    start: {Slack.Bot, :start_link, [MyBot, [], "api_token", %{name: MyBot}]}
  }
]
opts = [strategy: :one_for_one]
Supervisor.start_link(children, opts)

As a result, the supervisor will invoke Slack.Bot.start_link/4 function with the four provided arguments ([MyBot, [], "api_token", [name: MyBot]) and Slack.Bot.start_link/4 will register the process with the provided name.

If you choose MyBot as a name as above, you can send it a message with:

Process.send(MyBot, :message_to_bot, [])

or by using Kernel.send/2 primitive:

send(MyBot, :message_to_bot)

It will then be processed by handle_info/3 callback.

As a side note, processes in OTP supervision trees with a registered name probably should be based on OTP modules and let OTP framework do the registration. In OTP framework, name registration happens very early in the init phase and if there is a conflict, the process is stopped and start_link returns an error ({:error,{:already_started,pid}}).

Slack.Bot.start_link/4 is indeed based on OTP modules: it is based on :websocket_client module which is itself based on :gen_fsm from OTP. However, in its current implementation, instead of passing the name down to :websocket_client.start_link/4 which passes it down to :gen_fsm.start_link/4, the function registers the name directly with Process.register/2. As a result, if there is a name conflict, the bot might connect to Slack anyway.

Asynchronous messages and replies

Process.send/3 as well as Kernel.send/2 primitive send a message asynchronously. These functions return immediatly.

If the first parameter is the pid of the process, these functions succeed even if the process is no longer running. If it is an atom, they will fail if no process is registered by this name.

To get a reply from the bot process, you need to implement some mechanism where the bot process knows where to send the reply to. This mechanism is provided by OTP's gen_server and its Elixir counterpart GenServer.call/2, yet this is not available here as part of Slack.Bot API.

The Erlang way to do this is to send a tuple with the pid of the caller, typically as the first argument. So you would do:

send(MyBot, {self(), :message_to_bot})
receive do result -> result end

The bot then receives and replies to the message as:

def handle_info({caller, message}, slack, state) do
    ...
    send(caller, result)
end

This is a very simplistic version of the call. GenServer.call/2 does more such as handling timeout, making sure the response is not some random message you would get but the result of the call, and that the process does not disappear during the call. In this simple version, your code could wait forever for the reply.

To prevent this, you should at least add a timeout and a way to make sure this is not a random message such as:

def call_bot(message) do
    ref = make_ref()
    send(MyBot, {self(), ref, message})
    receive do
        {:reply, ^ref, result} -> {:ok, result}
    after 5_000 ->
        {:error, :timeout}
    end
end

And for the handle_info part, simply return the opaque ref that was passed in the tuple:

def handle_info({caller, ref, message}, slack, state) do
    ...
    send(caller, {:reply, ref, result})
end

make_ref/0 is a primitive creating a new, unique ref, typically for this usage.

Handling result of start_link/3 when using Supervisor

Supervisors, pids and start functions

Pids and names

The case of Slack.Bot.start_link/4

Asynchronous messages and replies

The case of `Slack.Bot.start_link/4`