Python programming: multiprocessing

I have a script. It's CPU-intensive and I have access to a multi-core machine, but it's not showing its utilization. How to use the multiprocessing library in Python 3? Or maybe something else? Any suggestions how to modify the script are welcome. Thank you!

from nltk.corpus import wordnet as wn

from itertools import chain 

for line in infile:

  word = line.strip()

  if (word not in Dict):

      Dict[word]=(set(["-","-","-"]),0)

  lemma = lmtzr.lemmatize(word)

  for w, net1, net2, lch in syn(lemma):



      if (word not in Dict):

          Dict[word]={}

      for l in net2.lemmas():

          synonyms.append(l.name())


  Dict[word] = (set(synonyms),round(lch,2))


  synonyms =[]



infile.close()


csv_writer(Dict, "Text8_types_similar_lch.csv")

Solution

You can use joblib. First, put your code in a function that works on any number of lines. You can either write the results of the function to a csv file, resulting in a different file for each process, which you will have to merge, or just return something:

def my_func(lines):
    return_dict = {}
    for line in lines:
        # put your code here
    return return_dict

Then, write a function to split up lines into chunks of some smaller size:

from itertools import islice

def grouper(n, iterable):
    it = iter(iterable)
    while True:
        chunk = tuple(islice(it, n))
       if not chunk:
           return
       yield chunk

Finally, call joblib's Parallel to pass each chunk of data to your function:

from joblib import Parallel, delayed

results = Parallel(n_jobs=num_cores)(
    delayed(my_func)(line_chunk) for line_chunk in grouper(lines, 500))

results will then be a list of returned items from my_func, and you can merge them how you like.