Search code examples
pythonlinuxnetstat

An exception of netstat when using netstat -lntp to detect a process listening on a port is down or not?


I have wrote a script which performs like supervisord detecting a process is down or not. When a server is down then start it. Sometimes i found that the process was running but the script thinked it was down.

def check_status(service, port):
    """
        check_the service status.
    args:
        service: the name of the service.
        port:
    """
    cmd = "netstat -lntp | grep %s | grep %s | awk -F '[:]' '{print $2}'" % (service, port)
    logger.info(cmd+"\n")
    results = os.popen(cmd).readlines()
    logger.info(results)
    return bool(results)

here is the log:

2017-04-02 07:53:02,006,1491090782.006675,INFO-netstat -lntp | grep uwsgi | grep 8083 | awk -F '[:]' '{print $2}'

2017-04-02 07:53:02,043,1491090782.043374,INFO-[]
2017-04-02 07:53:02,043,1491090782.043619,INFO-2017-04-02 07:53:02 [ERROR] uwsgi:8083 is down.

2017-04-02 07:53:02,043,1491090782.043733,INFO-2017-04-02 07:53:02 [INFO] try to start uwsgi:8083

2017-04-02 07:53:02,043,1491090782.043814,INFO-cmd:sh /usr/local/sandai/webrtc-env/apprtc/sbin/apprtc.sh start  8083
2017-04-02 07:53:03,100,1491090783.100647,INFO-netstat -lntp | grep uwsgi | grep 8083 | awk -F '[:]' '{print $2}'

2017-04-02 07:53:03,138,1491090783.138201,INFO-['8083                0.0.0.0\n']
2017-04-02 07:53:03,138,1491090783.138506,INFO-2017-04-02 07:53:03 [INFO] uwsgi have been started.

but when i used ps -ef | grep uwsgi | grep 8083 i found that the server is not down:

[ops01@test 2017.04.02]# ps -ef | grep uwsgi | grep 8083
ops01    22684     1  0  2016 ?        00:03:14 uwsgi --plugin    http,python,gevent --http :8083

is it not proper to use netstat to detect a process is down or not ? and why? thanks


Solution

  • "Server running" and "server listening on the port" are essentially two different things. Depending on how the server is implemented, it can happen, that process itself is running but it was unable to start listening on the port. Also, there is always some window between starting the server and server actually starting to listen on the port.

    I usually use two separate processes for this purpose:

    • supervisor process is making sure, that server process itself is running - this can be detected reliably using fork()/wait() functions (or their python counterparts). If server dies, then it can be restarted.
    • monitoring process is making sure, that server is working properly. There you have to consider that you may have false positives and add some retries/double-checks. If it finds out, that server is not functional, it could notify the supervisor to restart the server or kill the server itself and let supervisor restart it.