I have created this simple code to check multiprocessing reading from a global dictionary object:
import numpy as np
import multiprocessing as mp
import psutil
from itertools import repeat
def computations_x( max_int ):
#random selection
mask_1 = np.random.randint( low=0, high=max_int, size=1000 )
mask_2 = np.random.randint( low=0, high=max_int, size=1000 )
exponent_1 = np.sqrt( np.pi )
vector_1 = np.array( [ read_obj[ k ]**( exponent_1 ) for k in mask_1 ] )
vector_2 = np.array( [ read_obj[ k ]**np.pi for k in mask_2 ] )
result = []
for j in range(100):
res_col = []
for i in range(100):
c = np.multiply( vector_1, vector_2 ).sum( axis=0 )
res_col.append(c)
res_col = np.array( res_col )
result.append( res_col )
result = np.array( result )
return result
global read_obj
total_items = 40000
max_int = 1000
keys = np.arange(0, max_int)
number_processors = psutil.cpu_count( logical=False )
#number_used_processors = 1
number_used_processors = number_processors - 1
number_tasks = number_used_processors
read_obj = { k: np.random.rand( 1000 ) for k in keys }
pool = mp.Pool( processes = number_used_processors )
args = list( repeat( max_int, number_tasks ) )
results = pool.map( computations_x, args )
pool.close()
pool.join()
However, when looking at CPU performance, I see that the CPU's are being switched by the OS when performing the computations. I am running on Ubuntu 18.04, is this normal behaviour when using Python's MP module? Here is what I observe in the system monitor when debugging the code (I am using Eclipse2019 for debugging)
Any help is appreciated, as in my main project I need to share a global "read only" object through processes in the same spirit as is done here, and I want to be sure this is not affecting performance really badly; I also want to make sure all tasks are executed concurrently within the Pool class. thanks.
I'd say that is the normal behaviour as the OS has to make sure that other processes are not starving for CPU time.
Here's a nice article on the OS scheduler basics: https://www.ardanlabs.com/blog/2018/08/scheduling-in-go-part1.html
It's focusing on Golang but the first part is pretty general.