I have some issues using lmfit in different process to make my code faster. How can I define some share array that contains the results of every fitting?
I have a cube of data, with position a, b, x and f(x). I made a model in lmfit that work nice and adjust f(x) for one point, returning some parameters. Lmfit returns a class called ModelResult() witch contains every of these parameters and some extra data useful. So, I need run this fitting for each a and b, and later make a cube with this parameters and maybe the extra data. I can run this in a linear way (without parallelize) but I have more than 1000 points and the model is complicated, so it take more than a 15000 seconds.
My problem start when I use the multiprocessing lib. I need to share data between every process so when a process finish, lock the variable and store the results inside, and later unlock the variable. The multiprocessing lib has Value() or Array() to do what I need. I intented to use Array, and made some change of variable to pass from a and b to c where c is the range of a*b. But I can't define an Array to keep the ModelResult() for each c.
Here the code:
import multiprocessing as mp
import numpy as np
import time
from lmfit import Model
from numpy import sqrt, exp, pi
#Set time zero
start_time = time.time()
#Example of functions to fit
def gaussian(x, amp, cen, wid):
"""1-d gaussian: gaussian(x, amp, cen, wid)"""
return (amp / (sqrt(2*pi) * wid)) * exp(-(x-cen)**2 / (2*wid**2))
def linear(x, slope, intercept):
"""a linear function"""
return slope*x + intercept
#Function to fit every point
def fit_point(a,b,data_cube,x,pars,mod):
pos=int(b+(a*10))
data_point = np.array(data_cube[:,a,b])
#print(pos,mp.current_process().name)
error_point= np.array((data_point*0)+0.002) #Example error
res_point=mod.fit(data_point, pars, weights=1./error_point, x=x)
print(res_point.fit_report())
cube_res[pos]=res_point
return #cube_res[:]
#Invented some data
x=np.arange(10)
data_cube=np.random.rand(10, 10, 10)
#Example of a model with 2 gaussians and a line
mod = Model(linear, prefix='l_')+Model(gaussian, prefix='g1_')+Model(gaussian, prefix='g2_')
pars= mod.make_params()
pars['g1_amp'].set(0.5)
pars['g2_amp'].set(0.5)
pars['g1_cen'].set(2)
pars['g2_cen'].set(3)
pars['g1_wid'].set(0.5)
pars['g2_wid'].set(0.5)
pars['l_slope'].set(1)
pars['l_intercept'].set(1)
#Definition of the shared Array #Where I think there is the problem!
cube_res = mp.Array('u', 100)
#Definition of the process and starts
processes = []
for a in np.arange(0,10):
for b in np.arange(0,10):
process = mp.Process(target=fit_point, args=(a,b,data_cube,x,pars,mod))
process.start()
processes.append(process)
for process in processes:
process.join()
print('Time count')
print("--- %s seconds ---" % (time.time() - start_time))
#Intent to print some results
#print(cube_res[20].fit_report)
#Final, to recover a,b
#final_cube_res = np.reshape(cube_res, (100,100))
The errors are:
TypeError: unicode string expected instead of ModelResult instance
This is because I define mp.Array('u', 100), where 'u' is unicode and 100 is the range.
I don't know how to define the Array to can save the ModelResult inside.
Thanks for reading!
The error message is telling you that you cannot put a lmfit.ModelResult
into a string. In fact, your
cube_res = mp.Array('u', 100)
is saying that cube_res
is an Array of 100 unicode characters. That is, each result would need to be 1 character long. I think what you want instead is to use a multiprocessing.Manger().dict
to hold the results.
cube_res = mp.Manager().dict()
This will allow you to use the (x, y) position as the key and you can then put the ModelResult as the value....
... but: you won't be able to save the ModelResult
directly because the data shared between multiprocesses needs to be pickled, and in general complex objects, especially those with methods, are not easily picklable. The good news is that lmfit.ModelResult
has a dumps()
method that you can use to convert that object into a picklable json string. So then your code would use
res_point=mod.fit(data_point, pars, weights=1./error_point, x=x)
print(res_point.fit_report())
cube_res[(a,b)] = res_point.dumps()
return
in your fit_point()
function.
We're not done yet, because restoring the ModelResult
from the dumped json string is slightly complicated (lmfit has helper functions for this, but these assume you've written the results to a file). You would have to do this after you've joined
all your processes:
print('Time count')
print("--- %s seconds ---" % (time.time() - start_time))
import lmfit
# make a dummy ModelResult -- we'll overwrite everything for this
modres = lmfit.model.ModelResult(lmfit.Model(gaussian), lmfit.Parameters())
# make a dictionary of the functions you actually used (the function *names*
# are included in the dumped string, but not the functions themselves)
funcdefs = {'gaussian': gaussian, 'linear': linear}
for pos, dumpval in cube_res.items():
modelresult = modres.loads(dumpval, funcdefs=funcdefs)
print('### Result for fit ', pos)
print(modelresult.fit_report())
I think that should get you started...