Search code examples
erlangelixirhotswaphot-reload

how does EVM hold code version in a process and what's the OTP do in term of hot reload?


I'm exploring Elixir/Erlang hot reload and attempt to know how does Erlang hot reload works.

some post here give a glance of hot swap and here, from elixir, give a key step of hot swap.

Besides, I give a try with Erlang ranch which is famous tcp pool library to know how does hot swap hold tcp connection in developing and deploy environment. the code is at github(there are some chinese in readme.md, feeling free to test it just with mix run or iex -S mix and telnet localhost 8000 to test it).

A very impact things at hot reload is a process will be kill when the process holding code is removed. In this stage, I should give it a recovery strategy or make sure the code can't be removed when doing hot swap.I think a good practice is remove the logic code to another file which is different the socket connection loop code.

What I confuse is how does EVM recognize code version in a process and remove the old version on updating new code?

and I'm often hear OTP will help to do hot reload?
this document describe the step to do upgrade, how does it handle the hot reload in a running environment?

Thanks.


Solution

  • The next module is a super basic server, to illustrate the code change without OTP mechanism. The answer targets only erlang code.

    -module (modtest).
    
    -export ([init/0,loop/1]).
    
    init() ->
        spawn(?MODULE, loop, [version()]).
    
    loop(Version) ->
        receive
            reload ->
                io:format("external call state is ~p, current version is ~p~n",[Version,version()]),
                ?MODULE:loop(version());
            stop ->
                io:format("stopped~n");
            Message ->
                io:format("receive message ~p~n---> local call state is ~p, current version is ~p~n",[Message,Version,version()]),
                loop(version())
        after 10000 ->
            io:format("timeout, state is ~p, current version is ~p~n",[Version,version()]),
            loop(version())
        end.
    
    version() -> version1.
    

    First, try the module in version 1

    1> c(modtest).
    {ok,modtest}
    2> P = modtest:init().
    <0.66.0>
    timeout, state is version1, current version is version1
    3> P! message1.
    receive message message1
    ---> local call state is version1, current version is version1
    message1
    timeout, state is version1, current version is version1
    4> P ! reload.
    external call state is version1, current version is version1
    reload
    timeout, state is version1, current version is version1
    

    Next, make a huge evolution

    version() -> version2.
    

    Compile the module outside of the VM and go back to the running application

    5> % compile outside version 2
    timeout, state is version1, current version is version1
    5> P! message1.               
    receive message message1
    ---> local call state is version1, current version is version1
    message1
    6> P ! reload.                
    external call state is version1, current version is version1
    reload
    7> P! message1.               
    receive message message1
    ---> local call state is version1, current version is version1
    message1
    timeout, state is version1, current version is version1
    

    Nothing occurs, the module is not automatically loaded, let's load the module in the VM

    8> % load new version
    timeout, state is version1, current version is version1
    8> l(modtest).
    {module,modtest}
    9> P! message1.      
    receive message message1
    ---> local call state is version1, current version is version1
    message1
    10> P ! reload.       
    external call state is version1, current version is version1
    reload
    11> P! message1.
    receive message message1
    ---> local call state is version1, current version is version2
    message1
    12> P! message1.
    receive message message1
    ---> local call state is version2, current version is version2
    message1
    timeout, state is version2, current version is version2
    13> P! stop.   
    stopped
    stop
    14>
    

    Good, the new code has been updated after the firs "fully qualified" call in the module, unfortunately, you cannot control when the new code is taken into account. In the example, even if there is a reload function, the new code is used at the next loop, too late if any modification is needed in the state data. The next code uses an intermediate fully qualified call in order to allow a modification of the state data.

    -module (modtest).
    
    -export ([init/0,loop/1,code_change/1]).
    
    init() ->
        spawn(?MODULE, loop, [version()]).
    
    loop(Version) ->
        receive
            reload ->
                NewVersion = ?MODULE:code_change(Version),
                io:format("external call state is ~p, current version is ~p~n",[Version,NewVersion]),
                ?MODULE:loop(NewVersion);
            stop ->
                io:format("stopped~n");
            Message ->
                io:format("receive message ~p~n---> local call state is ~p, current version is ~p~n",[Message,Version,version()]),
                loop(version())
        after 10000 ->
            io:format("timeout, state is ~p, current version is ~p~n",[Version,version()]),
            loop(version())
        end.
    
    version() -> version3.
    
    code_change(Version) ->
        io:format("it is possible here to do any action on the state: ~p before the code change is completed~n",[Version]),
        % It is possible to have different adaptation depending on the current version
        version().
    

    Check this new version in the VM

    1> c(modtest).
    {ok,modtest}
    2> P = modtest:init().
    <0.66.0>
    3> P ! message.
    receive message message
    ---> local call state is version3, current version is version3
    message
    4> P ! message.
    receive message message
    ---> local call state is version3, current version is version3
    message
    5> P ! reload.
    it is possible here to do any action on the state: version3 before the code change is completed
    reload
    external call state is version3, current version is version3
    6> P ! reload.
    it is possible here to do any action on the state: version3 before the code change is completed
    reload
    external call state is version3, current version is version3
    timeout, state is version3, current version is version3
    7> % new version
    

    Do a new version

    ...
    version() -> version4.
    ...
    

    and go back to the VM

    7> c(modtest).        
    {ok,modtest}
    timeout, state is version3, current version is version3
    8> P ! message.
    receive message message
    ---> local call state is version3, current version is version3
    message
    9> P ! message.
    receive message message
    ---> local call state is version3, current version is version3
    message
    10> P ! reload.  
    it is possible here to do any action on the state: version3 before the code change is completed
    reload
    external call state is version3, current version is version4
    11> P ! message.
    receive message message
    ---> local call state is version4, current version is version4
    message
    12> P ! stop.   
    stopped
    stop
    13>
    

    Good, it works as expected. But there is still a huge limitation, the "server" cannot use any other fully qualified call, otherwise, there is no guaranty that the function code_change will be called immediately after the new code is loaded.

    This is the behavior brought by the OTP in a release upgrade or downgrade (see release_handling ).