I have a micro service with a job that needs to happen only if a different server is up. for a few weeks it works great, if the server was down, the micro service sleeps a bit without doing the job (as should) and if the server was up - the job was done. the server is never down for more then a few minutes (for sure! the server is highly monitored), so the job is skipped 2-3 times tops.
Today I entered my Docker Container and noticed in the logs that the job didn't even try to continue for a few weeks now (bad choice not to monitor I know), indicating, I assume that some kind of deadlock happened. I also assume that the problem is with my Exception handling, could use some advice I work alone.
def is_server_healthy():
url = "url" #correct url for health check path
try:
res = requests.get(url)
except Exception as ex:
LOGGER.error(f"Can't health check!{ex}")
finally:
pass
return res
def init():
while True:
LOGGER.info(f"Sleeping for {SLEEP_TIME} Minutes")
time.sleep(SLEEP_TIME*ONE_MINUTE)
res = is_server_healthy()
if res.status_code == 200:
my_api.DoJob()
LOGGER.info(f"Server is: {res.text}")
else:
LOGGER.info(f"Server is down... {res.status_code}")
(The names of the variables were changed to simplify the question)
The health check is simple enough - return "up" if up. anything else considered to be down, so unless status 200 and "up" came back I consider the server to be down.
In case your server is down you get a non-captured error:
NameError: name 'res' is not defined
Why? See:
def is_server_healthy():
url = "don't care"
try:
raise Exception() # simulate fail
except Exception as ex:
print(f"Can't health check!{ex}")
finally:
pass
return res ## name is not known ;o)
res = is_server_healthy()
if res.status_code == 200: # here, next exception bound to happen
my_api.DoJob()
LOGGER.info(f"Server is: {res.text}")
else:
LOGGER.info(f"Server is down... {res.status_code}")
Even if you declared the name, it would try to access some attribute thats not there:
if res.status_code == 200: # here - object has no attribute 'status_code' my_api.DoJob() LOGGER.info(f"Server is: {res.text}") else: LOGGER.info(f"Server is down... {res.status_code}")
would try to access a member thats simply not there => Exception, and process gone.
You are probably better off using some system-specific way to call your script once every minute (Cron Jobs, Task Scheduler) then idling in a while True:
with sleep.