Search code examples
pythonstack-tracepool

Do not print stack-trace using Pool python


I use a Pool to run several commands simultaneously. I would like to don't print the stack-trace when the user interrupt the script.

Here is my script structure:

def worker(some_element):
    try:
        cmd_res = Popen(SOME_COMMAND, stdout=PIPE, stderr=PIPE).communicate()
    except (KeyboardInterrupt, SystemExit):
        pass
    except Exception, e:
        print str(e)
        return

    #deal with cmd_res...

pool = Pool()
try:
    pool.map(worker, some_list, chunksize = 1)
except KeyboardInterrupt:
    pool.terminate()
    print 'bye!'

By calling pool.terminated() when KeyboardInterrupt raises, I expected to don't print the stack-trace, but it doesn't works, I got sometimes something like:

^CProcess PoolWorker-6:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 102, in worker
    task = get()
  File "/usr/lib/python2.7/multiprocessing/queues.py", line 374, in get
    racquire()
KeyboardInterrupt
Process PoolWorker-1:
Process PoolWorker-2:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
Traceback (most recent call last):

...
bye!

Do you know how I could hide this?

Thanks.


Solution

  • In your case you don't even need pool processes or threads. And then it gets easier to silence KeyboardInterrupts with try-catch.

    Pool processes are useful when your Python code does CPU-consuming calculations that can profit from parallelization. Threads are useful when your Python code does complex blocking I/O that can run in parallel. You just want to execute multiple programs in parallel and wait for the results. When you use Pool you create processes that do nothing other than starting other processes and waiting for them to terminate.

    The simplest solution is to create all of the processes in parallel and then to call .communicate() on each of them:

    try:
        processes = []
        # Start all processes at once
        for element in some_list:
            processes.append(Popen(SOME_COMMAND, stdout=PIPE, stderr=PIPE))
        # Fetch their results sequentially
        for process in processes:
            cmd_res = process.communicate()
            # Process your result here
    except KeyboardInterrupt:
        for process in processes:
            try:
                process.terminate()
            except OSError:
                pass
    

    This works when when the output on STDOUT and STDERR isn't too big. Else when another process than the one communicate() is currently running for produces too much output for the PIPE buffer (usually around 1-8 kB) it will be suspended by the OS until communicate() is called on the suspended process. In that case you need a more sophisticated solution:

    Asynchronous I/O

    Since Python 3.4 you can use the asyncio module for single-thread pseudo-multithreading:

    import asyncio
    from asyncio.subprocess import PIPE
    
    loop = asyncio.get_event_loop()
    
    @asyncio.coroutine
    def worker(some_element):
        process = yield from asyncio.create_subprocess_exec(*SOME_COMMAND, stdout=PIPE)
        try:
            cmd_res = yield from process.communicate()
        except KeyboardInterrupt:
            process.terminate()
            return
        try:
            pass # Process your result here
        except KeyboardInterrupt:
            return
    
    # Start all workers
    workers = []
    for element in some_list:
        w = worker(element)
        workers.append(w)
        asyncio.async(w)
    
    # Run until everything complete
    loop.run_until_complete(asyncio.wait(workers))
    

    You should be able to limit the number of concurrent processes using e.g. asyncio.Semaphore if you need to.