I am running several cat | zgrep
commands on a remote server and gathering their output individually for further processing:
class MainProcessor(mp.Process):
def __init__(self, peaks_array):
super(MainProcessor, self).__init__()
self.peaks_array = peaks_array
def run(self):
for peak_arr in self.peaks_array:
peak_processor = PeakProcessor(peak_arr)
peak_processor.start()
class PeakProcessor(mp.Process):
def __init__(self, peak_arr):
super(PeakProcessor, self).__init__()
self.peak_arr = peak_arr
def run(self):
command = 'ssh remote_host cat files_to_process | zgrep --mmap "regex" '
log_lines = (subprocess.check_output(command, shell=True)).split('\n')
process_data(log_lines)
This, however, results in sequential execution of the subprocess('ssh ... cat ...') commands. Second peak waits for first to finish and so on.
How can I modify this code so that the subprocess calls run in parallel, while still being able to collect the output for each individually?
Another approach (rather than the other suggestion of putting shell processes in the background) is to use multithreading.
The run
method that you have would then do something like this:
thread.start_new_thread ( myFuncThatDoesZGrep)
To collect results, you can do something like this:
class MyThread(threading.Thread):
def run(self):
self.finished = False
# Your code to run the command here.
blahBlah()
# When finished....
self.finished = True
self.results = []
Run the thread as stated above in the link on multithreading. When your thread object has myThread.finished == True, then you can collect the results via myThread.results.