Search code examples
pythonscipycurve-fittingscipy-optimizelmfit

Fitting a custom COBRApy model to data


I am trying to fit a model constructed using the cobra package to some data using curve_fit or limfit and I encounter a ValueError as shown in the minimal example below:

Here is a minimal cobra model construction:

import cobra
from cobra import Model, Reaction, Metabolite

#Create some metabolites
mc_e = Metabolite(
    'c_e',
    name = 'carbon source',
    compartment = 'e'
)
mc_c = Metabolite(
    'c_c',
    name = 'carbon source',
    compartment = 'c'
)

#Create some reactions
rxc = Reaction('C')
rxc.name = 'carbon uptake'
rxc.lower_bound = 0.
rxc.upper_bound = 1000.

rxd = Reaction('D')
rxd.name = 'carbon demand'
rxd.lower_bound = 0.
rxd.upper_bound = 1000.

#add metabolites to the reactions
rxc.add_metabolites({
    mc_e   : -1,
    mc_c   :  1
})
rxd.add_metabolites({
    mc_c   : -1,
})

#create a model
model = Model('test_model')

#add the reactions to the model
model.add_reactions([rxc, rxd])

# create and add exchange reactions to the model
model.add_boundary(model.metabolites.get_by_id("c_e") , type="exchange", ub=0); #only inflow (-)

Here is a simple function that modifies the above model with some input parameters, runs it, and returns one of the fluxes as output:

def fmodel (cmax, sc): # fitting model
    global model
    with model as model:
        rxc.add_metabolites({
        mc_e   : -1,
        mc_c   :  sc
        }, combine=False)
        rxc.bounds = (0, cmax) 
        model.objective = 'C'
        # Run FBA with the model
        solution = model.optimize()
        return solution.fluxes['D']

If we call the above function with some test parameters it works:

fmodel(2, 2) # returns 4.0

However, trying to fit the model using curve_fit(),

from scipy.optimize import curve_fit

# Provide initial parameters and their bounds
p0 = [1.5]  # initial guess
bounds = ( # lower and upper bounds
    [0], 
    [1000]) 
popt, pcov = curve_fit(fmodel, [1, 2, 3], [2, 4, 6], p0, bounds=bounds)

produces the following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[30], line 7
      3 p0 = [1.5]  # initial guess
      4 bounds = ( # lower and upper bounds
      5     [0], 
      6     [1000])  
----> 7 popt, pcov = curve_fit(fmodel, [1, 2, 3], [2, 4, 6], p0, bounds=bounds)

File /usr/lib/python3/dist-packages/scipy/optimize/_minpack_py.py:800, in curve_fit(f, xdata, ydata, p0, sigma, absolute_sigma, check_finite, bounds, method, jac, **kwargs)
    797 if 'max_nfev' not in kwargs:
    798     kwargs['max_nfev'] = kwargs.pop('maxfev', None)
--> 800 res = least_squares(func, p0, jac=jac, bounds=bounds, method=method,
    801                     **kwargs)
    803 if not res.success:
    804     raise RuntimeError("Optimal parameters not found: " + res.message)

File /usr/lib/python3/dist-packages/scipy/optimize/_lsq/least_squares.py:820, in least_squares(fun, x0, jac, bounds, method, ftol, xtol, gtol, x_scale, loss, f_scale, diff_step, tr_solver, tr_options, jac_sparsity, max_nfev, verbose, args, kwargs)
    817 if method == 'trf':
    818     x0 = make_strictly_feasible(x0, lb, ub)
--> 820 f0 = fun_wrapped(x0)
    822 if f0.ndim != 1:
    823     raise ValueError("`fun` must return at most 1-d array_like. "
    824                      "f0.shape: {0}".format(f0.shape))

File /usr/lib/python3/dist-packages/scipy/optimize/_lsq/least_squares.py:815, in least_squares.<locals>.fun_wrapped(x)
    814 def fun_wrapped(x):
--> 815     return np.atleast_1d(fun(x, *args, **kwargs))

File /usr/lib/python3/dist-packages/scipy/optimize/_minpack_py.py:485, in _wrap_func.<locals>.func_wrapped(params)
    484 def func_wrapped(params):
--> 485     return func(xdata, *params) - ydata

Cell In[24], line 8, in fmodel(cmax, sc)
      3 with model as model:
      4     rxc.add_metabolites({
      5     mc_e   : -1,
      6     mc_c   :  sc
      7     }, combine=False)
----> 8     rxc.bounds = (0, cmax) 
      9     model.objective = 'C'
     10     # Run FBA with the model

File /usr/local/lib/python3.10/dist-packages/cobra/util/context.py:107, in resettable.<locals>.wrapper(self, new_value)
    105 old_value = getattr(self, func.__name__)
    106 # Don't clutter the context with unchanged variables
--> 107 if old_value == new_value:
    108     return
    109 context(partial(func, self, old_value))

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

This seems to be due to the tendency of the fitting procedures to pass vectorized data to the fit function where it accepts only a scalar. How can I avoid the problem it is causing in this case?


Solution

  • It turns out curve_fit() passes the entire independent variable data array as cmax along with a scalar fit parameter as sc in each call of the model function. So, I renamed the original fit model to fmodel_single and defined the wrapper below to handle the input array and return an array of outputs:

    def fmodel(cmax, sc):
        # This function applies fmodel_single to each element of cmax and sc
        return np.array([fmodel_single(c, sc) for c in cmax])
    

    Now curve_fit calling the wrapper function works successfully. Thanks to @jared for debug suggestion!