Search code examples
javasocketssystemd

wait with systemd until a service socket becomes available and then start a depended service


Currently I have slow starting java service in systemd which takes about 60 seconds until it opens its HTTP port and serves other clients.

Another client service expects this service to be available (is a client of the this service), otherwise it dies after a certain retry. It also started with systemd. This is to be clear also a service. But uses the former like database.

Can I configure systemd to wait until the first service has made his socket available? (something like if the socket is actually listens , then the second client service should start).


Solution

  • Initialization Process Requires Forking

    systemd waits for a daemon to initialize itself if the daemon forks. In your situation, that's pretty much the only way you have to do this.

    The daemon offering the HTTP service must do all of its initialization in the main thread, once that initialization is done and the socket is listening for connections, it will fork(). The main process then exits. At that point systemd knows that your process was successfully initialized (exit 0) or not (exit 1).

    Such a service receives the Type=... value of forking as follow:

    [Service]
    Type=forking
    ...
    

    Note: If you are writing new code, consider not using fork. systemd already creates a new process for you so you do not have to fork. That was an old System V boot requirement for services.

    "Requires" will make sure the process waits

    The other services have to wait so they have to require the first to be started. Say your first service is called A, you would have a Requires like this:

    [Unit]
    ...
    Requires=A
    ...
    

    Program with Patience in Mind

    Of course, there is always another way which is for the other services to know to be patient. That means try to connect to the HTTP port, if it fails, sleep for a bit (in your case, 1 or 2 seconds would be just fine) then try again, until it works.

    I have developed both methods and they both work very well.

    Note: A powerful aspect to this method, if service A gets restarted, you'd get a new socket. This server can then auto-reconnect to the new socket when it detects that the old one goes down. This means you don't have to restart the other services when restarting service A. I like this method, but it's a bit more work to make sure it's all properly implemented.

    Use the systemd Auto-Restart Feature?

    Another way, maybe, would be to use the restart on failure. So if the child attempts to connect to that HTTP service and fails, it should fail, right? systemd can automatically restart your process over and over again until it succeeds. It's sucky, but if you have no control over the code of those daemons, it's probably the easiest way.

    [Service]
    ...
    Restart=on-failure
    RestartSec=10
    #SuccessExitStatus=3 7   # if success is not always just 0
    ...
    

    This example waits 10 seconds after a failure before attempting to restart.

    Hack (last resort, not recommended)

    You could attempt a hack, although I do not ever recommend such things because something could happen that breaks such... in the services, change the files so that they have a sleep 60 then start the main process. For that, just write a script like so:

    #!/bin/sh
    sleep 60
    "$@"
    

    Then in the .service files, call that script as in:

    ExecStart=/path/to/script /path/to/service args to service
    

    This will run the script instead of directly your code. The script will first sleep for 60 seconds and then try to run your service. So if for some reason this time the HTTP service takes 90 seconds... it will still fail.

    Still, this can be useful to know since that script could do all sorts of things, such as use the nc tool to probe the port before actually starting the service process. You could even write your own probing tool.

    #!/bin/sh
    while true
    do
      sleep 1
      if probe
      then
        break
      fi
    done
    "$@"
    

    However, notice that such a loop is blocking until probe returns with exit code 0.