Search code examples
pythondataframenonlinear-optimizationscipy-optimizeequation-solving

Solve non linear equations with scipy NonlinearConstraint over pandas dataframe


I'm trying to solve the system of equations:

enter image description here

where a,b and c are columns from pandas dataframe. I used to work with Excel, where I run a macro with go seak in one column (residuals) by changing the values of the other columns (functions), but I don't know how to do that in Python, and I have asked here, but no answer yet.

I have tried fsolve, but the result is always my initial guess, so, I guess it needs some bounds, but fsolve doesn't allow the introduction of bounds, that's why I'm trying scipyn.NonlinearConstraint Below are the dataset and functions to solve:

import numpy as np
import pandas as pd
import scipy.optimize as opt
from scipy.optimize import NonlinearConstraint
a = np.linspace(300,400,30) 
b = np.random.randint(700,18000,30) 
c = np.random.uniform(1.4,4.0,30) 
df = pd.DataFrame({'A':a, 'B':b, 'C':c})


def func(zGuess,*Params):
    x,y,z = zGuess
    a,b,c = Params
    
    eq_1 = (((3.47-np.log10(y))**2+(np.log10(c)+1.22)**2)**0.5) - x
    eq_2 = ((a/101.32) * (101.32/b)** z) - y
    eq_3 = (0.381 * x + 0.05 * (b/101.32) -0.15) - z
    return eq_1,eq_2,eq_3


zGuess = np.array([2.6,20.2,0.92])
up_bound = (9,np.inf,1.1)
low_bound = (0,-np.inf,0)

and here I try to solve with fsolve:

df['x'],df['y'],df['z'] = zip(*df.apply(lambda x: opt.fsolve(func,zGuess,args=(x['A'],x['B'],x['C'])),1) )

and here with Nonlinearconstraint:

df['x'],df['y'],df['z'] = zip(*df.apply(lambda x: NonlinearConstraint(func(zGuess,x['A'],x['B'],x['C']),low_bound,up_bound),1) )

but with Nonlinearconstraint it gives an error:

TypeError: zip argument #1 must support iteration

I have even tried to minimize over the pandas dataframe:

def func2(zGuess,*Params):
    x,y = zGuess
    a,b,c,n = Params
    
    eq_1 = (((3.47-np.log10(y))**2+(np.log10(c)+1.22)**2)**0.5) - x
    eq_2 = ((a/101.32) * (101.32/b)** n) - y
    
    return eq_1,eq_2,eq_3



zGuess2 = np.array([2.6,20.2])
n = np.linspace(0,1.,500)
for i in n:
    df['x'],df['y'] = zip(*df.apply(lambda x: opt.fsolve(func2,zGuess2,args=(x['a'],x['b'],x['c'],i)),1))
    df['n_calc'] = df.apply(lambda x: (0.381 * x['x'] + 0.05 * (x['b']/101.32) - 0.15),1)
    res = np.abs(df['n_calc'] - i)
    if res <= 1e-5:
        n_solver = i
        df['n_solver'].append(n_solver)
    

I never thought something so simple like go seak in excel would give such difficulty to perform in python, I really need some help, please.


Solution

  • I'll ignore the pandas part of your question and will only address the optimization part. Ignoring all the pandas noise, your question is basically how to solve an equation with simple bounds on the variables. This has been answered several times, see here for example.

    Here are some hints:

    • Your bounds don't make sense, e.g. the logarithm is only defined for positive values, not for zero.

    • Please make yourself familiar with np.ndarrays. It will make your life much easier when working with scipy.optimize.

    • Your NonlinearConstraint doesn't make sense as well. This is an object that can be passed to scipy.optimize.minimize. It is not callable or an optimization method.

    • If you want to solve an equation with some bounds on the variables, you need to reformulate the problem as nonlinear optimization problem and solve it by means of scipy.optimize.minimize, see the above linked answer for more details. Note that I changed your func such that it returns a np.ndarray.

    Following is a working version that solves your equation for each row in your dataframe. Feel free to rewrite it such that you can use it inside df.apply().

    from scipy.optimize import minimize
    
    def func(zz, *params):
        x,y,z = zz
        a,b,c = params
        
        eq_1 = (((3.47-np.log10(y))**2+(np.log10(c)+1.22)**2)**0.5) - x
        eq_2 = ((a/101.32) * (101.32/b)** z) - y
        eq_3 = (0.381 * x + 0.05 * (b/101.32) -0.15) - z
        return np.array((eq_1,eq_2,eq_3))
    
    bounds = [(1e-6, None), (1e-6, None), (1e-6, None)]
    zGuess = np.array([2.6,20.2,0.92])
    
    # create the new columns
    df[["x", "y", "z"]] = np.zeros((30, 3))
    
    for i, row in df.iterrows():
        params = row.values[:3]
        res = minimize(lambda x: np.sum(func(x, *params)**2), zGuess, bounds=bounds)
        # res.x contains your solution
        # let's write the values into the dataframe
        for k, val in zip(['x', 'y', 'z'], res.x):
            df.loc[i, k] = val
    

    I never thought something so simple like go seak in excel would give such difficulty to perform in python

    Mixing several different concepts without reading the docs makes each task difficult in any arbitrary programming language.