I am trying to solve a non-linear optimization using the PyGMO package
The optimization class is defined separately and then called via a separate function dyn_optimGMO
. This optimization has to be done and saved for, say, 1000 random initial vectors defined by variable (nodes) inits ( or init_val)
Using timeit
module I found that each iteration takes approx 17 seconds
to complete. This means it will take approx 5 hours
for 1000 iterations
. This is a very large time.
If I have to repeat this for, say, 20 perturb
nodes then the total iterations go to 200000
which will take linear time as calculated above.
I tried solving this issue by using python multiprocessing
module to parallelize each set of 1000 iterations for each of the 20 perturb nodes. But this doesn't help.
I also tried using Numba jit functions but they don't recognize pyGMO modules and hence fail.
Is there any way to parallelize this code and make it faster for any number of iterations?
Please let me know if my question is clear enough, if not then I will add details as required.
import numpy as np
import pygmo as pg
matL = np.random.rand(300,300) ; node_len = 300
inits = []; results = []
perturb = {25:0} #setting a random node, say, node 25 to 0
class my_constrained_udp:
def __init__(self):
pass
def fitness(self, x):
matA = np.matrix(x)
obj1 = matA.dot(matL).dot(matA.T)[0,0] #this is int value
ce1 = sum(init_val) - sum(x)
return [obj1, ce1]
def n_objs(self): # no of objectives
return 1
def get_nec(self): #no of equality constraints
return 1
def get_nic(self): #no of in-equality constraints
return 0
def get_bounds(self): #lower and upper bounds: use this to perturb the node
lowerB = np.array([0]*node_len); upperB = np.array([1]*node_len)
if perturb:
for k,v in perturb.items():
lowerB[k] = v; upperB[k] = v
return (lowerB,upperB)
def gradient(self, x):
return pg.estimate_gradient_h(lambda x: self.fitness(x), x)
def dyn_optimGMO(matL, node_len ,init):
if perturb:
for k,v in perturb.items(): init_val[k] = v #setting perturbations in initial values
inner_algo = pg.nlopt("slsqp"); inner_algo.maxeval = 5
algo = pg.algorithm(uda = pg.mbh(inner_algo, stop = 2, perturb = .2))
#algo.set_verbosity(10) # in this case this correspond to logs each 1 call to slsqp
pop = pg.population(prob = my_constrained_udp(), size = 1000 , seed=123)
pop.problem.c_tol = [1E-6] * 1 # get_nec + get_nic = 1, so multiplied by 1
pop = algo.evolve(pop)
res = pop.champion_x
return res
# running above optimization code for 1000 random initializations
for i in range(1000):
init_val = np.array([random.uniform(0, 1) for k in range(node_len)])
if perturb:
for k,v in perturb.items(): init_val[k] = v #setting perturbations in initial values
res = dyn_optimGMO(matL ,node_len ,init_val) # this function is defined here only
inits.append(init_val); results.append(res)
EDIT 1:
As suggested below by @Ananda, I made the changes in the objective function which reduced the running time by almost 7x times. I have rewritten the code to run this code over 1000 iterations
using python multiprocessing
module. Below is my new code where I am trying to spawn processes to process iterations parallely. Since my system has only 8 threads so I have limited the Pool size to 5 only, because PyGMO uses internal parallelization as well and it needs some threads as well
import numpy as np
import pygmo as pg
matL = np.random.rand(300,300) ; node_len = 300
perturb = {12:1} # assign your perturb ID here
def optimizationFN(var):
results = []
inits = var[0]; perturb = var[1]
class my_constrained_udp:
def fitness(self, x):
obj1 = x[None,:] @ matL @ x[:,None] # @ is mat multiplication operator
ce1 = np.sum(inits) - np.sum(x)
return [obj1, ce1]
def n_objs(self): # no of objectives
return 1
def get_nec(self): #no of equality constraints
return 1
def get_nic(self): #no of in-equality constraints
return 0
def get_bounds(self): #lower and upper bounds: use this to perturb the node
lowerB = np.array([0]*node_len); upperB = np.array([1]*node_len)
if perturb:
for k,v in perturb.items():
lowerB[k] = v; upperB[k] = v
return (lowerB,upperB)
def gradient(self, x):
return pg.estimate_gradient_h(lambda x: self.fitness(x), x)
def dyn_optimGMO(matL, node_len ,inits):
'''
perturb should be a dict of node index and value as 0 or 1. Type(node_index) = int
'''
if perturb:
for k,v in perturb.items(): inits[k] = v #setting perturbations in initial values
inner_algo = pg.nlopt("slsqp"); inner_algo.maxeval = 5
algo = pg.algorithm(uda = pg.mbh(inner_algo, stop = 2, perturb = .2))
#algo.set_verbosity(10) # in this case this correspond to logs each 1 call to slsqp
pop = pg.population(prob = my_constrained_udp(), size = 100 , seed=123)
pop.problem.c_tol = [1E-6] * 1 # get_nec + get_nic = 1, so multiplied by 1
pop = algo.evolve(pop)
res = pop.champion_x
return res
if perturb:
for k,v in perturb.items(): inits[k] = v #setting perturbations in initial values
res = dyn_optimGMO(matL ,node_len ,inits) # this function is defined here only
results.append(res)
return results
import time
st = time.time()
#1000 random initialisations
initial_vals = []
for i in range(1000): initial_vals.append(np.array([random.uniform(0, 1) for k in range(node_len)]))
initial_vals = np.array(initial_vals)
inp_val = []
for i in range(len(initial_vals)): inp_val.append([initial_vals[i],perturb])
#running evaluations
#eqVal = optimizationFN(initial_vals,perturb=perturb)
from multiprocessing import Pool
myPool = Pool(8)
data = myPool.map(optimizationFN,inp_val)
myPool.close(); myPool.join()
print('Total Time: ',round(time.time()-st,4))
This executes the entire 1000 iterations in 1.13 hours
.
Still, is there any other possibility so that I can make it faster?
Before trying to parallelize etc. try to figure out what exactly is the bottleneck for the performance and try to fix that.
If you profile your fitness function using line profiler,
Line # Hits Time Per Hit % Time Line Contents
==============================================================
28 @profile
29 def fitness(self, x):
30 96105 3577978.0 37.2 9.5 matA = np.matrix(x)
31 96105 8548353.0 88.9 22.7 obj1 = matA.dot(matL).dot(matA.T)[0,0] #this is int value
32 96105 25328835.0 263.6 67.4 ce1 = sum(init_val) - sum(x)
33 96105 121800.0 1.3 0.3 return [obj1, ce1]
As you can see, most time is spent in the dot
and sum
function with a significant time also spent creating matA
.
I would rewrite the function like so -
def fitness(self, x):
obj1 = x[None, :] @ matL @ x[:, None]
ce1 = np.sum(init_val) - np.sum(x)
return [obj1, ce1]
If you profile ths function you can see,
Line # Hits Time Per Hit % Time Line Contents
==============================================================
20 @profile
21 def fitness(self, x):
22
23 77084 3151649.0 40.9 48.9 obj1 = x[None, :] @ matL @ x[:, None]
24 77084 3214012.0 41.7 49.9 ce1 = np.sum(init_val) - np.sum(x)
25
26 77084 79439.0 1.0 1.2 return [obj1, ce1]
The total time per hit for the full function went down from around 380 to 80.
np.matrix
method is recommended not to be used anymore and is going to deprecated. And using native python sum
instead of np.sum
can reduce the performance by a lot.
On my machine, it gives a performance improvement from 33 sec/it to 6 sec/iteration. Around 5x gain in performance.