Search code examples
pythongrepblockingpopen

python Popen: How do I block the execution of grep command until the content to grep is ready?


I have been fighting against Popen in python for couple of days now, so I decided to put all my doubts here, hopefully all of them can be clarified by python experts.

Initially I use Popen to execute a command and grep the result(as one command using pipe, something like xxx | grep yyy), with shell=False, as you can imagine, that doesn't work quite well. Following the guide in this post, I changed my code to the following:

checkCmd = ["sudo", "pyrit", "-r", self.capFile, "analyze"]
checkExec = Popen(checkCmd, shell=False, stdout=PIPE, stderr=STDOUT)
grepExec = Popen(["grep", "good"], stdin=checkExec.stdout, stdout=PIPE)
output = grepExec.stdout.readline()
output = grepExec.communicate()[0]

But I realized that the checkExec runs slowly and since Popen is non-blocking, grepExec always get executed before checkExec shows any result, thus the grep output would always be blank. How can I postpone the execution of grepExec till checkExec is finished?

  1. In another Popen in my program, I tried to keep a service open at the back, so I use a separate thread to execute it. When all the tasks are done, I notify this thread to quit, and I explicitly call Popen.kill() to stop the service. However, my system ends up with a zombie process that is not reaped. I don't know if there's a nice way to clean up everything in this background thread after it finishes?

  2. What are the differences between Popen.communicate()[0] and Popen.stdout.readline()? Can I use a loop to keep reading output from both of them?


Solution

  • Your example would work if you do it like this:

    checkCmd = ["sudo", "pyrit", "-r", self.capFile, "analyze"]
    checkExec = Popen(checkCmd, shell=False, stdout=PIPE, stderr=STDOUT)
    grepExec = Popen(["grep", "good"], stdin=checkExec.stdout, stdout=PIPE)
    
    for line in grepExec.stdout:
        # do something with line
    

    You use communicate when you want to give some input to a process and read all output on stdout, stderr of the process at the same time. This is probably not what you want for your case. communicate is more for the cases where you want to start an application, feed all the input it needs to it and read its output.

    As other answers have pointed out you can use shell=True to create the pipeline in your call to subprocess, but an alternative which I would prefer is to leverage python and instead of setting up a pipeline doing:

    checkCmd = ["sudo", "pyrit", "-r", self.capFile, "analyze"]
    checkExec = Popen(checkCmd, shell=False, stdout=PIPE, stderr=STDOUT)
    for line in checkExec.stdout:
        if line.find('good') != -1:
            do something with the matched line here