Search code examples
pythonwhile-loopmicroservicesdeadlockdocker-container

is there a deadlock in my simple loop code


I have a micro service with a job that needs to happen only if a different server is up. for a few weeks it works great, if the server was down, the micro service sleeps a bit without doing the job (as should) and if the server was up - the job was done. the server is never down for more then a few minutes (for sure! the server is highly monitored), so the job is skipped 2-3 times tops.

Today I entered my Docker Container and noticed in the logs that the job didn't even try to continue for a few weeks now (bad choice not to monitor I know), indicating, I assume that some kind of deadlock happened. I also assume that the problem is with my Exception handling, could use some advice I work alone.

def is_server_healthy():
    url = "url" #correct url for health check path
    try:
        res = requests.get(url)
    except Exception as ex:
        LOGGER.error(f"Can't health check!{ex}")
    finally:
        pass

    return res

def init():
    while True:
        LOGGER.info(f"Sleeping for {SLEEP_TIME} Minutes")
        time.sleep(SLEEP_TIME*ONE_MINUTE)

        res = is_server_healthy()

        if res.status_code == 200:
            my_api.DoJob()
            LOGGER.info(f"Server is: {res.text}")
        else:
            LOGGER.info(f"Server is down... {res.status_code}")

(The names of the variables were changed to simplify the question)

The health check is simple enough - return "up" if up. anything else considered to be down, so unless status 200 and "up" came back I consider the server to be down.


Solution

  • In case your server is down you get a non-captured error:

    NameError: name 'res' is not defined
    

    Why? See:

    def is_server_healthy():
        url = "don't care"
        try:
            raise Exception()  # simulate fail
        except Exception as ex:
            print(f"Can't health check!{ex}")
        finally:
            pass
    
        return res   ## name is not known ;o)
    
    res = is_server_healthy()
    if res.status_code == 200:   # here, next exception bound to happen
        my_api.DoJob()
        LOGGER.info(f"Server is: {res.text}")
    else:
        LOGGER.info(f"Server is down... {res.status_code}")
    

    Even if you declared the name, it would try to access some attribute thats not there:

    if res.status_code == 200:   # here - object has no attribute 'status_code'   
        my_api.DoJob()
        LOGGER.info(f"Server is: {res.text}")
    else:
        LOGGER.info(f"Server is down... {res.status_code}")
    

    would try to access a member thats simply not there => Exception, and process gone.


    You are probably better off using some system-specific way to call your script once every minute (Cron Jobs, Task Scheduler) then idling in a while True: with sleep.