I have a script. It's CPU-intensive and I have access to a multi-core machine, but it's not showing its utilization. How to use the multiprocessing library in Python 3? Or maybe something else? Any suggestions how to modify the script are welcome. Thank you!
from nltk.corpus import wordnet as wn
from itertools import chain
for line in infile:
word = line.strip()
if (word not in Dict):
Dict[word]=(set(["-","-","-"]),0)
lemma = lmtzr.lemmatize(word)
for w, net1, net2, lch in syn(lemma):
if (word not in Dict):
Dict[word]={}
for l in net2.lemmas():
synonyms.append(l.name())
Dict[word] = (set(synonyms),round(lch,2))
synonyms =[]
infile.close()
csv_writer(Dict, "Text8_types_similar_lch.csv")
You can use joblib. First, put your code in a function that works on any number of lines. You can either write the results of the function to a csv file, resulting in a different file for each process, which you will have to merge, or just return something:
def my_func(lines):
return_dict = {}
for line in lines:
# put your code here
return return_dict
Then, write a function to split up lines
into chunks of some smaller size:
from itertools import islice
def grouper(n, iterable):
it = iter(iterable)
while True:
chunk = tuple(islice(it, n))
if not chunk:
return
yield chunk
Finally, call joblib's Parallel
to pass each chunk of data to your function:
from joblib import Parallel, delayed
results = Parallel(n_jobs=num_cores)(
delayed(my_func)(line_chunk) for line_chunk in grouper(lines, 500))
results
will then be a list of returned items from my_func
, and you can merge them how you like.