Search code examples
pythonconcurrencyjupyter-notebookpython-multithreadingpapermill

Running Multiple functions in the same Thread in a specific order


I have three functions that execute 3 different jupyter notebooks using papermill and I want the first (job1) and second (job2) functions to run concurrently and the last function (job3) to run only once the first function (job1) has finished running without any errors. I'm not sure if it makes sense to create a new thread for the second function or how to use the join() method appropriately. I'm running on Windows and for some reason concurrent.futures and multiprocessing don't work, which is why I'm using the thread module.

def job1():

    return pm.execute_notebook('notebook1.ipynb',log_output=False)

def job2():

     return pm.execute_notebook('notebook2.ipynb',log_output=False)

def job3():

     return pm.execute_notebook('notebook3.ipynb',log_output=False)


t1 = threading.Thread(target = job1)
t2 = threading.Thread(target = job2)
t3 = threading.Thread(target = job3)


try:
   t1.start()
   t1.join()
   t2.start()

except:
   pass

finally:

   t3.start()

Solution

  • I like to start off by visualizing the desired flow, which I understand to look like:

    enter image description here

    This means that t1 and t2 need to start concurrently and then you need to join on both:

       t1.start() # <- Started 
       t2.start() # <- Started
       # t1 and t2 executing concurrently
    
       t1.join()
       t2.join()
       # wait for both to finish
    
       t3.start()
       t3.join()
    

    The t1, t2 join order isn't really important since your program has to wait for the longest running thread anyway. If t1 finishes first it will block on t2, if t2 finishes first it still needs to wait for t1, and then will "no-op" on the t2.join().