I'm exploring Elixir/Erlang hot reload and attempt to know how does Erlang hot reload works.
some post here give a glance of hot swap and here, from elixir, give a key step of hot swap.
Besides, I give a try with Erlang ranch which is famous tcp pool library to know how does hot swap hold tcp connection in developing and deploy environment. the code is at github(there are some chinese in readme.md, feeling free to test it just with mix run
or iex -S mix
and telnet localhost 8000
to test it).
A very impact things at hot reload is a process will be kill when the process holding code is removed. In this stage, I should give it a recovery strategy or make sure the code can't be removed when doing hot swap.I think a good practice is remove the logic code to another file which is different the socket connection loop code.
What I confuse is how does EVM recognize code version in a process and remove the old version on updating new code?
and I'm often hear OTP will help to do hot reload?
this document describe the step to do upgrade, how does it handle the hot reload in a running environment?
Thanks.
The next module is a super basic server, to illustrate the code change without OTP mechanism. The answer targets only erlang code.
-module (modtest).
-export ([init/0,loop/1]).
init() ->
spawn(?MODULE, loop, [version()]).
loop(Version) ->
receive
reload ->
io:format("external call state is ~p, current version is ~p~n",[Version,version()]),
?MODULE:loop(version());
stop ->
io:format("stopped~n");
Message ->
io:format("receive message ~p~n---> local call state is ~p, current version is ~p~n",[Message,Version,version()]),
loop(version())
after 10000 ->
io:format("timeout, state is ~p, current version is ~p~n",[Version,version()]),
loop(version())
end.
version() -> version1.
First, try the module in version 1
1> c(modtest).
{ok,modtest}
2> P = modtest:init().
<0.66.0>
timeout, state is version1, current version is version1
3> P! message1.
receive message message1
---> local call state is version1, current version is version1
message1
timeout, state is version1, current version is version1
4> P ! reload.
external call state is version1, current version is version1
reload
timeout, state is version1, current version is version1
Next, make a huge evolution
version() -> version2.
Compile the module outside of the VM and go back to the running application
5> % compile outside version 2
timeout, state is version1, current version is version1
5> P! message1.
receive message message1
---> local call state is version1, current version is version1
message1
6> P ! reload.
external call state is version1, current version is version1
reload
7> P! message1.
receive message message1
---> local call state is version1, current version is version1
message1
timeout, state is version1, current version is version1
Nothing occurs, the module is not automatically loaded, let's load the module in the VM
8> % load new version
timeout, state is version1, current version is version1
8> l(modtest).
{module,modtest}
9> P! message1.
receive message message1
---> local call state is version1, current version is version1
message1
10> P ! reload.
external call state is version1, current version is version1
reload
11> P! message1.
receive message message1
---> local call state is version1, current version is version2
message1
12> P! message1.
receive message message1
---> local call state is version2, current version is version2
message1
timeout, state is version2, current version is version2
13> P! stop.
stopped
stop
14>
Good, the new code has been updated after the firs "fully qualified" call in the module, unfortunately, you cannot control when the new code is taken into account. In the example, even if there is a reload function, the new code is used at the next loop, too late if any modification is needed in the state data. The next code uses an intermediate fully qualified call in order to allow a modification of the state data.
-module (modtest).
-export ([init/0,loop/1,code_change/1]).
init() ->
spawn(?MODULE, loop, [version()]).
loop(Version) ->
receive
reload ->
NewVersion = ?MODULE:code_change(Version),
io:format("external call state is ~p, current version is ~p~n",[Version,NewVersion]),
?MODULE:loop(NewVersion);
stop ->
io:format("stopped~n");
Message ->
io:format("receive message ~p~n---> local call state is ~p, current version is ~p~n",[Message,Version,version()]),
loop(version())
after 10000 ->
io:format("timeout, state is ~p, current version is ~p~n",[Version,version()]),
loop(version())
end.
version() -> version3.
code_change(Version) ->
io:format("it is possible here to do any action on the state: ~p before the code change is completed~n",[Version]),
% It is possible to have different adaptation depending on the current version
version().
Check this new version in the VM
1> c(modtest).
{ok,modtest}
2> P = modtest:init().
<0.66.0>
3> P ! message.
receive message message
---> local call state is version3, current version is version3
message
4> P ! message.
receive message message
---> local call state is version3, current version is version3
message
5> P ! reload.
it is possible here to do any action on the state: version3 before the code change is completed
reload
external call state is version3, current version is version3
6> P ! reload.
it is possible here to do any action on the state: version3 before the code change is completed
reload
external call state is version3, current version is version3
timeout, state is version3, current version is version3
7> % new version
Do a new version
...
version() -> version4.
...
and go back to the VM
7> c(modtest).
{ok,modtest}
timeout, state is version3, current version is version3
8> P ! message.
receive message message
---> local call state is version3, current version is version3
message
9> P ! message.
receive message message
---> local call state is version3, current version is version3
message
10> P ! reload.
it is possible here to do any action on the state: version3 before the code change is completed
reload
external call state is version3, current version is version4
11> P ! message.
receive message message
---> local call state is version4, current version is version4
message
12> P ! stop.
stopped
stop
13>
Good, it works as expected. But there is still a huge limitation, the "server" cannot use any other fully qualified call, otherwise, there is no guaranty that the function code_change will be called immediately after the new code is loaded.
This is the behavior brought by the OTP in a release upgrade or downgrade (see release_handling ).