Search code examples
pythonscipyscipy-optimizechemistrynonlinear-equation

Least_square inaccurate in chemical speciation


I'm having an issue using the least_squares function to solve a system of non-linear equation to obtain a chemical speciation (here of the system Nickel-ammonia). The system of equations is given here :

enter image description here

def equations(x):
    Ni0, Ni1, Ni2, Ni3, Ni4, Ni5, Ni6, NH3 = x
    eq1 = np.log10(Ni1)-np.log10(Ni0)-np.log10(NH3)-K1
    eq2 = np.log10(Ni2)-np.log10(Ni1)-np.log10(NH3)-K2
    eq3 = np.log10(Ni3)-np.log10(Ni2)-np.log10(NH3)-K3
    eq4 = np.log10(Ni4)-np.log10(Ni3)-np.log10(NH3)-K4
    eq5 = np.log10(Ni5)-np.log10(Ni4)-np.log10(NH3)-K5
    eq6 = np.log10(Ni6)-np.log10(Ni5)-np.log10(NH3)-K6
    eq7 = Ni0 + Ni1 + Ni2 + Ni3 + Ni4 + Ni5 + Ni6 - Ni_tot
    eq8 = Ni1 + 2*Ni2 + 3*Ni3 + 4*Ni4 + 5*Ni5 + 6*Ni6 - NH3_tot[i]
    
    return np.array([eq1, eq2, eq3, eq4, eq5, eq6, eq7, eq8])

I use a loop to calculate the values of the parameters at various NH3_tot, allowing to observe the variation of concentrations of the various species. The solving is done using the following :

# Arrays for data storage
NH3_tot = np.arange(0, 6, 0.1)
Ni0_values = []
Ni1_values = []
Ni2_values = []
Ni3_values = []
Ni4_values = []
Ni5_values = []
Ni6_values = []
Solutions = []

# Speciation determination with least_square at various NH3_tot
for i in range(len(NH3_tot)):
    x0 = [1,0.1,0.1,0.1,0.1,0.1,0.1,0.1]
    result = least_squares(equations, x0, bounds=([0,0,0,0,0,0,0,0],[Ni_tot,Ni_tot,Ni_tot,Ni_tot,Ni_tot,Ni_tot,Ni_tot,Ni_tot]))
    
    
    Solution = result.x
    Solutions.append(Solution)
    Ni0_value,Ni1_value,Ni2_value,Ni3_value,Ni4_value,Ni5_value,Ni6_value, NH3 = Solution
    
    # Extract concentrations of species
    Ni0_values.append(Ni0_value)
    Ni1_values.append(Ni1_value)
    Ni2_values.append(Ni2_value)
    Ni3_values.append(Ni3_value)
    Ni4_values.append(Ni4_value)
    Ni5_values.append(Ni5_value)
    Ni6_values.append(Ni6_value)

The issue I get is that for some values (generally the ones for low and high NH3_tot values), are completelly wrong, especially not respecting eq. 7, as the sum of the species exceeds Ni_tot.

I tried moving the bounds, but it does not seem to help, and the extrem values are generally wrong.

Printing the result always shows : message: xtol termination condition is satisfied. success: True

Indicating that the function effectively operates correctly.

Changing the chemical system results in similar output.

Changing the writting of the equations impacts greatly the result (If know why please help !!), and the use of np.log10 gives the most accurate, but still unsatisfactory.

Changing the method of resolution of the least_squares function does not improve the result.

Here is the complete script with all the variables:

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import least_squares
import pandas as pd

Ni_tot = 1
K1 = 2.81
K2 = 2.27
K3 = 1.77
K4 = 1.27
K5 = 0.81
K6 = 0.15

# Define the system of nonlinear equations
def equations(x):
    Ni0, Ni1, Ni2, Ni3, Ni4, Ni5, Ni6, NH3 = x
    eq1 = np.log10(Ni1)-np.log10(Ni0)-np.log10(NH3)-K1
    eq2 = np.log10(Ni2)-np.log10(Ni1)-np.log10(NH3)-K2
    eq3 = np.log10(Ni3)-np.log10(Ni2)-np.log10(NH3)-K3
    eq4 = np.log10(Ni4)-np.log10(Ni3)-np.log10(NH3)-K4
    eq5 = np.log10(Ni5)-np.log10(Ni4)-np.log10(NH3)-K5
    eq6 = np.log10(Ni6)-np.log10(Ni5)-np.log10(NH3)-K6
    eq7 = Ni0 + Ni1 + Ni2 + Ni3 + Ni4 + Ni5 + Ni6 - Ni_tot
    eq8 = Ni1 + 2*Ni2 + 3*Ni3 + 4*Ni4 + 5*Ni5 + 6*Ni6 - NH3_tot[i]
    
    return np.array([eq1, eq2, eq3, eq4, eq5, eq6, eq7, eq8])

# Arrays for data storage
NH3_tot = np.arange(0, 14, 0.1)
Ni0_values = []
Ni1_values = []
Ni2_values = []
Ni3_values = []
Ni4_values = []
Ni5_values = []
Ni6_values = []
Solutions = []

# Speciation determination with least_square at various NH3_tot
for i in range(len(NH3_tot)):
    x0 = [0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1]
    result = least_squares(equations, x0, bounds=([0,0,0,0,0,0,0,0],[Ni_tot,Ni_tot,Ni_tot,Ni_tot,Ni_tot,Ni_tot,Ni_tot,Ni_tot]))
    print(result)
    
    Solution = result.x
    Solutions.append(Solution)
    Ni0_value,Ni1_value,Ni2_value,Ni3_value,Ni4_value,Ni5_value,Ni6_value, NH3 = Solution
    
    # Extract concentrations of species
    Ni0_values.append(Ni0_value)
    Ni1_values.append(Ni1_value)
    Ni2_values.append(Ni2_value)
    Ni3_values.append(Ni3_value)
    Ni4_values.append(Ni4_value)
    Ni5_values.append(Ni5_value)
    Ni6_values.append(Ni6_value)
    
    

# Conversion to DataFrame
results = pd.DataFrame(Solutions, columns=['Ni', 'Ni(NH3)', 'Ni(NH3)2', 'Ni(NH3)3', 'Ni(NH3)4', 'Ni(NH3)5', 'Ni(NH3)6', 'NH3'])

# Plot the concentrations
plt.figure(figsize=(10, 6))
plt.plot(NH3_tot, Ni0_values, label='Ni')
plt.plot(NH3_tot, Ni1_values, label='Ni1')
plt.plot(NH3_tot, Ni2_values, label='Ni2')
plt.plot(NH3_tot, Ni3_values, label='Ni3')
plt.plot(NH3_tot, Ni4_values, label='Ni4')
plt.plot(NH3_tot, Ni5_values, label='Ni5')
plt.plot(NH3_tot, Ni6_values, label='Ni6')


plt.xlabel('eq NH3')
plt.ylabel('Concentration')
plt.title('Concentration des espèces en fonction de NH3')
plt.legend()
plt.grid(True)
plt.show()

The purpose is to study the variation of distribution of species for various ligands, i.e. different K values. But if the speciation is not generated correctly, i cannot trust my data.

Edit after jlandercy helped : Using the correction of @jlandercy, we managed to lower the inacuracy of the curve shape, using the mathematical expression of proportion of each complexes (see alpha_i).

However there seem to be an issue regarding the "concentration" of Ligand. Indeed, it does not make sense physically to have the formation of a complex (Ni-NH3) only as a function of the ligand concentration, and the ratio NH3/Ni_tot should be the variable. The expected result would be the one given bellow : enter image description here With C_Ni_tot = 1 mol/L, and C_NH3_tot = 0 -> 12 mol/L.

Using the script given by @jlandercy with the following modification :

L = np.linspace(0, 0.2, 100)
As = alphas(pK, L)

I manage to obtain something of the same type as shown : enter image description here

However, there is still the problem of the "ligand scale", which is weird, and I don't see the issue.


Solution

  • Potential cause

    Optimization procedures in chemistry are inherently tricky because numbers involved span several decades leading to float errors and inaccuracies.

    You can see in your example than bounds are met but results are meaningless as they does not fulfill the mass balance (real constraint in eq7). Additionally problem occurs at the sides of the axis, where inaccuracies become prominent.

    Anyway it seems it can be numerically solved by tuning smartly the initial guess for the solver.

    Concentrations (numerical solution)

    I leave here a procedure to find the concentrations by solving the complex equilibria and mass balances simultaneously.

    Given your pK (which for complex are simply log10 of constants K without minus sign we usually find in pKa):

    pK = np.array([2.81, 2.27, 1.77, 1.27, 0.81, 0.15])
    K = np.power(10., pK)
    
    def system(x, K, xt, Lt):
        # (X, XL, XL2, XL3, XL4, XL5, XL6, L)
        n = len(K)
        return np.array([
            # Complexation equilibria:
            x[i + 1] - K[i] * x[i] * x[n + 1]
            for i in range(len(K))
        ] + [
            # Mass balance (Nickel):
            np.sum(x[:n + 1]) - xt,
            # Mass balance (Ammonium):
            np.sum(np.arange(n + 1) * x[:n + 1]) + x[-1] - Lt,
        ])
    
    Llin = np.linspace(0, 12, 2000)
    xt = 1.
        
    def solve(K, xt, Lts):
        sols = []
        for Lt in Lts:
            sol = optimize.fsolve(system, x0=[Lt * 1.1] * (len(K) + 2), args=(K, xt, Lt))
            sols.append(sol)
        C = np.array(sols)
        return C
    
    C = solve(K, xt, Llin)
    
    fig, axe = plt.subplots(figsize=(6,6))
    axe.plot(Ls, C[:,:len(K) + 1])
    axe.set_title("Complex Specie Concentrations w/ Acid/Base")
    axe.set_xlabel("Total concentration of Ligand, $L$ [mol/L]")
    axe.set_ylabel("Complex Specie Concentration, $x_i$ [mol/L]")
    axe.legend(["$x_{%d}$" % i for i in range(len(K) + 1)])
    axe.grid()
    

    It gives:

    enter image description here

    Which seems to quite agree with your reference figure.

    Partitions (analytical solution)

    Your problem resume to write partition functions of complex species (Nickel and Ammonia) for which an analytical solution exists. Therefore there is no need for optimization if we can rely on math.

    Your complexation problem is governed by:

    enter image description here

    Details on how this formulae can be derived are explained in this post.

    The key point is that partitions can be expressed uniquely by constants and free ligand concentration.

    Caution: in this solution the x-axis is the free ligand equilibrium concentration instead of the total concentration of ligand in the above figure.

    I leave here the procedure to compute such partition functions.

    We can write the partition functions (rational functions summing up to unity):

    def monom(i, K, L):
        return np.power(L, i) * np.prod(K[:i])
        
    def polynom(K, L):
        return np.sum([monom(i, K, L) for i in range(len(K) + 1)], axis=0)
    
    def alpha(i, K, L):
        return monom(i, K, L) / polynom(K, L)
    

    And compute them all at once:

    def alphas(K, L):
        return np.array([
            alpha(i, K, L)
            for i in range(len(K) + 1)
        ]).T
    

    Then we assess it for some concentration range:

    Llog = np.logspace(-5, 2, 2000)
    As = alphas(K, Llog)
    

    We check partition sums are unitary:

    np.allclose(np.sum(As, axis=1), 1.)  # True
    

    Which leads to:

    enter image description here

    From this analytical solution, we can compute total ligand concentration:

    Lt = np.sum(As * np.arange(len(K) + 1), axis=1) + Llog
    

    And we get the desired figure:

    fig, axe = plt.subplots()
    axe.plot(Lt, As)
    axe.set_title("Complex Partition Functions")
    axe.set_xlabel("Total Ligand Concentration, L [mol/L]")
    axe.set_ylabel(r"Partition Function, $\alpha_i$ [-]")
    axe.legend([r"$\alpha_{%d}$" % i for i in range(len(pK) + 1)])
    axe.grid()
    

    enter image description here

    Which also agrees with your reference figure. Personally I'll choose the analytical solution as it is more performant and does not have the risk to diverge on singularities.

    Comparison

    We can confirm both solutions agrees together:

    Ltlin = np.linspace(0, 12, 2000)
    Cs = solve(K, xt, Ltlin)
    Lf = Cs[:,-1]
    Cs = (Cs[:,:-1].T / np.sum(Cs[:,:-1], axis=1)).T
    As = alphas(K, Lf)
    

    enter image description here

    Where solid colored lines are analytical solutions and dashed blacked lines are numerical solutions.