Search code examples
pythonmultiprocessingnltkwordnet

Python programming: multiprocessing


I have a script. It's CPU-intensive and I have access to a multi-core machine, but it's not showing its utilization. How to use the multiprocessing library in Python 3? Or maybe something else? Any suggestions how to modify the script are welcome. Thank you!

from nltk.corpus import wordnet as wn

from itertools import chain 

for line in infile:

  word = line.strip()

  if (word not in Dict):

      Dict[word]=(set(["-","-","-"]),0)

  lemma = lmtzr.lemmatize(word)

  for w, net1, net2, lch in syn(lemma):



      if (word not in Dict):

          Dict[word]={}

      for l in net2.lemmas():

          synonyms.append(l.name())


  Dict[word] = (set(synonyms),round(lch,2))


  synonyms =[]



infile.close()


csv_writer(Dict, "Text8_types_similar_lch.csv")

Solution

  • You can use joblib. First, put your code in a function that works on any number of lines. You can either write the results of the function to a csv file, resulting in a different file for each process, which you will have to merge, or just return something:

    def my_func(lines):
        return_dict = {}
        for line in lines:
            # put your code here
        return return_dict
    

    Then, write a function to split up lines into chunks of some smaller size:

    from itertools import islice
    
    def grouper(n, iterable):
        it = iter(iterable)
        while True:
            chunk = tuple(islice(it, n))
           if not chunk:
               return
           yield chunk
    

    Finally, call joblib's Parallel to pass each chunk of data to your function:

    from joblib import Parallel, delayed
    
    results = Parallel(n_jobs=num_cores)(
        delayed(my_func)(line_chunk) for line_chunk in grouper(lines, 500))
    

    results will then be a list of returned items from my_func, and you can merge them how you like.