I am trying to use multiprocessing for the below code. The code seems to run a bit faster than the for loop inside the function.
How can I confirm I using the library and not the just the for loop?
from multiprocessing import Pool
from multiprocessing import cpu_count
import requests
import pandas as pd
data= pd.read_csv('~/Downloads/50kNAE000.txt.1' ,sep="\t", header=None)
data = data[0].str.strip("0 ")
lst = []
def request(x):
for i,v in x.items():
print(i)
file = requests.get(v)
lst.append(file.text)
#time.sleep(1)
if __name__ == "__main__":
pool = Pool(cpu_count())
results = pool.map(request(data))
pool.close() # 'TERM'
pool.join() # 'KILL'
Multiprocessing has overhead. It has to start the process and transfer function data via interprocess mechanism. Just running a single function in another process vs. running that same function normally is always going to be slower. The advantage is actually doing parallelism with significant work in the functions that makes the overhead minimal.
You can call multiprocessing.current_process().name
to see the process name change.