Search code examples

Better examples of Parallel processing in Python

I hope I am not downvoted this time. I have been struggling with parallel processing in Python for a while(2 days , exactly). I have checking these resources(a partial list is shown here:



I came unstuck. What I want to do is this:


Break up the file into chunks(strings or numbers)
Broadcast a pattern to be searched to all the workers
Receive the offsets in the file where the pattern was found


Receive pattern and chunk of text from the master
Send back the offsets to the master.

I tried to implement this using MPI/concurrent.futures/multiprocessing and came unstuck.

My naive implementation using multiprocessing module

import multiprocessing

filename = "file1.txt"
pat = "afow"
N = 1000

""" This is the naive string search algorithm"""

def search(pat, txt):

    patLen = len(pat)
    txtLen = len(txt)
    offsets = []

    # A loop to slide pattern[] one by one
    # Range generates numbers up to but not including that number
    for i in range ((txtLen - patLen) + 1):

    # Can not use a for loop here
    # For loops in C with && statements must be
    # converted to while statements in python
        counter = 0
        while(counter < patLen) and pat[counter] == txt[counter + i]:
           counter += 1
           if counter >= patLen:
        return str(offsets).strip('[]')

       This is what I want 
if __name__ == "__main__":
     tasks = []
     pool_outputs = []
     pool = multiprocessing.Pool(processes=5)
     with open(filename, 'r') as infile:
           lines = []
           for line in infile:
                if len(lines) > N:
                     pool_output =, tasks)
                     lines = []
                if len(lines) > 0:
                     pool_output =, tasks)
     print('Pool:', pool_outputs)

with open(filename, 'r') as infile:
    for line in infile:
        print(search(pat, line))

I would be grateful for any guidance especially with the concurrent.futures. Thanks for your time. Valeriy helped me with his addition and I thank him for that.

But if anyone could just indulge me for a moment, this is the code I was working on for the concurrent.futures(working off an example I saw somewhere)

from concurrent.futures import ProcessPoolExecutor, as_completed
import math

def search(pat, txt):

    patLen = len(pat)
    txtLen = len(txt)
    offsets = []

# A loop to slide pattern[] one by one
# Range generates numbers up to but not including that number
    for i in range ((txtLen - patLen) + 1):

    # Can not use a for loop here
    # For loops in C with && statements must be
    # converted to while statements in python
        counter = 0
        while(counter < patLen) and pat[counter] == txt[counter + i]:
            counter += 1
            if counter >= patLen:
return str(offsets).strip('[]')

#Check a list of strings
def chunked_worker(lines):
    return {0: search("fmo", line) for line in lines}

def pool_bruteforce(filename, nprocs):
    lines = []
    with open(filename) as f:
        lines = [line.rstrip('\n') for line in f]
    chunksize = int(math.ceil(len(lines) / float(nprocs)))
    futures = []

    with ProcessPoolExecutor() as executor:
        for i in range(nprocs):
            chunk = lines[(chunksize * i): (chunksize * (i + 1))]
            futures.append(executor.submit(chunked_worker, chunk))

    resultdict = {}
    for f in as_completed(futures):
    return resultdict

filename = "file1.txt"
pool_bruteforce(filename, 5)

Thanks again , Valeriy and anyone who attempts to help me solve my riddle.


  • You are using several arguments, so:

    import multiprocessing
    from functools import partial
    filename = "file1.txt"
    pat = "afow"
    N = 1000
    """ This is the naive string search algorithm"""
    def search(pat, txt):
        patLen = len(pat)
        txtLen = len(txt)
        offsets = []
        # A loop to slide pattern[] one by one
        # Range generates numbers up to but not including that number
        for i in range ((txtLen - patLen) + 1):
        # Can not use a for loop here
        # For loops in C with && statements must be
        # converted to while statements in python
            counter = 0
            while(counter < patLen) and pat[counter] == txt[counter + i]:
               counter += 1
               if counter >= patLen:
            return str(offsets).strip('[]')
    if __name__ == "__main__":
         tasks = []
         pool_outputs = []
         pool = multiprocessing.Pool(processes=5)
         lines = []
         with open(filename, 'r') as infile:
             for line in infile:
         tasks = lines
         func = partial(search, pat)
         if len(lines) > N:
            pool_output =, lines )
         elif len(lines) > 0:
            pool_output =, lines )
         print('Pool:', pool_outputs)