Search code examples
pythonvscode-debugger

Can't pause python process using debug


I have a python script which starts multiple sub processes using these lines :

for elm in elements:
    t = multiprocessing.Process(target=sub_process,args=[elm])
    threads.append(t)
    t.start()

for t in threads:
   t.join()

Sometimes, for some reason the thread halts and the script never finishes. I'm trying to use VSCode debugger to find the problem and check where in the thread itself it stuck but I'm having issues pausing these sub processes because when I click the pause in the debugger window: enter image description here

It will pause the main thread and some other threads that are running properly but it won't pause the stuck sub process. Even when I try to pause the threads manually one by one using the Call Stack window, I can still pause only the working threads and not the stuck one. enter image description here

Please help me figure this thing, It's a hard thing because the thing that makes the process stuck doesn't always happen so it makes it very hard to debug.


Solution

  • First, those are subprocesses, not threads. It's important to understand the difference, although it doesn't answer your question.

    Second, a pause (manual break) in the Python debugger will break in Python code. It won't break in the machine code below that executes the Python, or in the machine code below that performing the OS services the Python code is asking for.

    If you execute a pause, the pause will occur in the Python code above the machine code when (and if) the machine code returns to the Python interpreter loop.

    Given a complete example:

    import multiprocessing
    import time
    
    elements = ["one", "two", "three"]
    
    def sub_process(gs, elm):
        gs.acquire()
        print("sleep", elm)
        time.sleep(60)
        print("awake", elm);
        gs.release()
    
    def test():
        gs = multiprocessing.Semaphore()
    
        subprocs = []
    
        for elm in elements:
            p = multiprocessing.Process(target=sub_process,args=[gs, elm])
            subprocs.append(p)
            p.start()
    
        for p in subprocs:
            p.join()
    
    if __name__ == '__main__':
        test()
    

    The first subprocess will grab the semaphore and sleep for a minute, and the second and third subprocesses will wait inside gs.acquire() until they can move forward. A pause will not break into the debugger until the subprocess returns from the acquire, because acquire is below the Python code.

    It sounds like you have an idea where the process is getting stuck, but you don't know why. You need to determine what questions you are trying to answer. For example:

    (Assuming) one of the processess is stuck in acquire. That means one of the other processess didn't release the semaphore. What code in which process is acquiring a semaphore and not releasing it?

    Looking at the semaphore object itself might tell you which subprocess is holding it, but this is a tangent: can you use the debugger to inspect the semaphore and determine who is holding it? For example, using a machine level debugger in windows, if these were threads and a critical section, it's possible to look at the critical section and see which thread is still holding it. I don't know if this could be done using processes and semaphores on your chosen platform.

    Which debuggers you have access to depend on the platform you're running on.

    In summary:

    • You can't break the Python debugger in machine code
    • You can run the Python interpreter in a machine code debugger, but this won't show you the Python code at all, which make life interesting. This can be helpful if you have an idea what you're looking for - for example, you might be able to tell that you're stuck waiting for a semaphore.
    • Running a machine code debugger becomes more difficult when you're running sub-processes, because you need to know which sub-process you're interested in, and attach to that one. This becomes simpler if you're using a single process and multiple threads instead, since there's only one process to deal with.

    "You can't get there from here, you have to go someplace else first."

    You'll need to take a closer look at your code and figure out how to answer the questions you need to answer using other means.