This code is python 3.5 hosted on pythonanywhere (linux).
I am using with open
to manage a non-blocking flock but sometimes the scheduled process runs into exceptions which cause the job to terminate; that's ok, but to my confusion, the lock is sometimes not released, and all subsequent attempts fail to proceed, because they are locked out.
In these circumstances I also see a process alive for many hours ('fetch processes' in the scheduled task tab) presumably this is the process keeping the flock. These jobs should normally take a couple of minutes. Killing it manually solves the problem. I don't understand how this is happening. Something which should trigger a timeout exception sometimes seems to hang (the code uses API calls some of them concurrent.)
It is intermittent ... Once or twice a month. Can I request pythonanywhere to be more aggressive at killing long running jobs? Would supervisor be a solution?
this is the top of the code:
with open('neto_update_lock.lock', 'w+') as lock_file:
try:
fcntl.flock(lock_file, fcntl.LOCK_EX|fcntl.LOCK_NB)
except BlockingIOError:
print ("Can't get a lock. Sorry, stopping now")
raise
I wrapped the calling code like so, to use a sub-process, as per https://stackoverflow.com/a/26664130/401226
from multiprocessing import Process
def run_with_limited_time(func, args, kwargs, time):
"""Runs a function with time limit
"""
p = Process(target=func, args=args, kwargs=kwargs)
p.start()
p.join(time)
if p.is_alive():
p.terminate()
print ("Terminated due to time out")
return False
return True
if __name__ == "__main__":
# set up argparse
parser = argparse.ArgumentParser(description='Sync Dear & Neto for Bellbird')
parser.add_argument('command', choices=['stock','PO_and_product'],
help='Command: stock, PO_and_product')
args = parser.parse_args()
if args.command == 'stock':
run_with_limited_time(dear_to_neto_qoh_update,args=[],kwargs = {'test_run':False},time=25*60)
elif args.command == 'PO_and_product':
run_with_limited_time(func=update_neto_product_master,args=[], kwargs={'test_run':False,'verbose':False},
time=25*60)