Python-Subprocess-Popen inconsistent behavior in a multi-threaded environment

I have following piece of code running inside thread.. 'executable' produces unique string output for each input 'url':

p = Popen(["executable", url], stdout=PIPE, stderr=PIPE, close_fds=True)
output,error = p.communicate()
print output

when above code gets executed for multiple input 'urls', the subprocess p's 'output' produced is not consistent.For some of the urls, subprocess gets terminated without producing any 'output'. I tried printing p.returncode for each failed 'p' instance(failed urls are not consistent across multiple runs either) and got '-11' as a return code with 'error' value as empty string.Can someone please suggest a way to get consistent behavior/output for each run in a multithreaded environment?

Solution

-11 as a return code might mean that C program is not fine e.g., you are starting too many subprocesses and it causes SIGSERV in the C executable. You can limit number of concurrent subprocesses using multiprocessing.ThreadPool, concurrent.futures.ThreadPoolExecutor, threading + Queue -based solutions:

#!/usr/bin/env python
from multiprocessing.dummy import Pool # uses threads
from subprocess import Popen, PIPE

def get_url(url):
    p = Popen(["executable", url], stdout=PIPE, stderr=PIPE, close_fds=True)
    output, error = p.communicate()
    return url, output, error, p.returncode

pool = Pool(20) # limit number of concurrent subprocesses
for url, output, error, returncode in pool.imap_unordered(get_url, urls):
    print("%s %r %r %d" % (url, output, error, returncode))

Make sure the executable can be run in parallel e.g., it doesn't use some shared resource. To test, you could run in a shell:

$ executable url1 & executable url2

Could you please explain more about "you are starting too many subprocesses and it causes SIGSERV in the C executable" and possibly solution to avoid that..

Possible problem:

"too many processes"
-> "not enough memory in the system or some other resource"
-> "trigger the bug in the C code that otherwise is hidden or rare"
-> "illegal memory access"
-> SIGSERV

The suggested above solution is:

"limit number of concurrent processes"
-> "enough memory or other resources in the system"
-> "bug is hidden or rare"
-> no SIGSERV

Understand what is SIGSEGV run time error in c++? In short, your program is killed with that signal if it tries to access a memory that it is not supposed to. Here's an example of such program:

/* try to fail with SIGSERV sometimes */
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

int main(void) {
  char *null_pointer = NULL;

  srand((unsigned)time(NULL));

  if (rand() < RAND_MAX/2) /* simulate some concurrent condition 
                              e.g., memory pressure */
    fprintf(stderr, "%c\n", *null_pointer); /* dereference null pointer */

  return 0;
}

If you run it with the above Python script then it would return -11 occasionally.

Also p.returncode is not sufficient for debugging purpose..Is there any other option to get more DEBUG info to get to the root cause?

I won't exclude the Python side completely but It is most likely that the problem is the C program. You could use gdb to get a backtrace to see where in a callstack the error comes from.