r numerical-methods nonlinear-optimization chemistry nonlinear-equation

R nleqslv difficulties - solving for pH in an acid-base buffer

Goal

Build a theoretical titration curve for the phosphoric acid buffer (1M).

I provide a fully reproducible and self-contained example (of my failures ^.^).

Model equations

Acid-base equilibrium equations for phosphoric acid are:

Model implementation

Ka.1 <- 7.1 * 10^-3 
Ka.2 <- 6.3 * 10^-8
Ka.3 <- 4.5 * 10^-13
Kw <- 10^-14

balance <- function(vars, Na_ca, P_ca, convert.fun=function(x) x){
  # Apply positive only constraint
  vars <- convert.fun(vars)
  
  H <- vars[1]
  H3A <- vars[2]
  H2A <- vars[3]
  HA <- vars[4]
  A <- vars[5]
  Na <- convert.fun(Na_ca)

  eq.system <- c(H3A + H2A + HA + A - P_ca, 
                 H + Na - Kw/H - H2A - 2*HA - 3*A, 
                 H * H2A / Ka.1 - H3A,
                 H * HA  / Ka.2 - H2A, 
                 H * A   / Ka.3 - HA
                 )
  return(eq.system)
}

Notice that convert.fun is there to try different ways of forcing positive values on concentrations.

The return value is a vector of the model's equations, equated to zero (is this right?).

Iteration

I wished to solve the system for all possible Na+ concentrations, up to 3 equivalence "volumes".

I set initial conditions that woked for the lowest one: [Na]=0.

Then solved it with nleqslv and used the result to "seed" the next iteration.

And it seemed to work nicely:

But, on close inspection, the issues will become obvious.

But, before that, some code!

Setup initial conditions and results matrix:

P_ca <- 1
ci.start <- c(H=10^-1, H3A=0.9, H2A=0.1, HA=0.1, A=0.1)
Na.seq <- seq(from=0,to=3*P_ca,by=P_ca/1000)

varnames <- c("Na", "H",  "H3A", "H2A", "HA",  "A")

result.m <- matrix(ncol = length(varnames), nrow = length(Na.seq))
colnames(result.m) <- varnames

result.m[,1] <- Na.seq

Iteration:

convert.fun <- function(x) abs(x)

for(i in 1:length(Na.seq)){
  
  Na_ca <- result.m[i,1]
  
  if(i == 1){                   # Si es la primera iteración,
    ci <- ci.start              # usar los valores "start" como C.I.
  } else {                      # Si no,
    ci <- result.m[i-1, 2:6]    # usar los valores de la solución anterior
  }
  
  result <- nleqslv::nleqslv(x = ci, 
                             fn = balance, 
                             Na = Na_ca, P = P_ca,
                             convert.fun = convert.fun,
                             method="Newton",  #method="Broyden",
                             global="dbldog",
                             control = list(allowSingular=TRUE,
                                            maxit=1000))
  
  result$x <- convert.fun(result$x)
  
  result.m[i,2:6] <- result$x
  
  stopifnot(all(result$x >= 0))
  
} # END LOOP

result.df <- as.data.frame(result.m)

Notice that convert.fun is now abs(x) (is this ok?).

The problem

The problem with the last plot is that the right part of it is flattened out.

The problem is even more obvious in the following plot:

The red curve is supossed to end up at the top, and the purple one at the bottom. This seems to start happening at Na~2, but after a few more iterations, the result flattens out (and becomes exactly constant).

Possible hints for the savvy

The problem is a bit worse using method="Broyden" instead of "Newton".
nleqslv's return message changes from "Function criterion near zero" to "x-values within tolerance 'xtol'".
I also tried adding a Jacobian. That didnt change the result, but at the problematic point I get something like this:

    Chkjac  possible error in jacobian[2,1] =  2.7598836276240e+06
                             Estimated[2,1] =  1.1104869955110e+04

I am now really out of ideas! And would really appreciate some help or guidance.

Solution

You should always test the termination code of nleqslv to determine if a solution has been found. And somehow display the termination code and/or the message nleqslv returns. You will see that in some case no better point was found. Therefore any result is invalid and useless.

You are using so many values for Na.seq that it is impossible to the wood through the trees.

I would suggest starting with a very limited set of values for Na.seq. Something like

Na.seq <- seq(from=0,to=3*P_ca,by=P_ca/10)

and also this to include the termination code in the result

varnames <- c("Na", "H",  "H3A", "H2A", "HA",  "A", "termcd")

result.m <- matrix(ncol = length(varnames), nrow = length(Na.seq))

And then change the iteration loop to this

for(i in 1:length(Na.seq)){

  Na_ca <- result.m[i,1]

  if(i == 1){                   # Si es la primera iteración,
    ci <- ci.start              # usar los valores "start" como C.I.
  } else {                      # Si no,
    ci <- result.m[i-1, 2:6]    # usar los valores de la solución anterior
  }

  iter.trace <- 1
  cat("Iteration ",i,"\n\n")
  result <- nleqslv::nleqslv(x = ci, 
                             fn = balance, 
                             Na = Na_ca, P = P_ca,
                             convert.fun = convert.fun,
                             method="Newton",  #method="Broyden",
                             global="dbldog",
                             control = list(allowSingular=TRUE,
                                            maxit=1000,trace=iter.trace))
  cat("\n\n ",result$message,"\n\n") 
  result$x <- convert.fun(result$x)

  result.m[i,2:6] <- result$x
  result.m[i,7] <- result$termcd

  stopifnot(all(result$x >= 0))

} # END LOOP

and start analysing the output to find out what the problem is and where.

Addendum

I am reasonably sure that the difficulties with solving are (partly) caused by numerical difficulties. With the above modifications I changed the values for Ka.1, Ka.2, Ka.3,and Kw to

Ka.1 <- 7.1 * 10^-1
Ka.2 <- 6.3 * 10^-3
Ka.3 <- 4.5 * 10^-3
Kw <- 10^-3

and then there are no problems in finding a solution (all termination codes are 1). I suspect that the very small values for the K... constants are the cause of the problem. Check the system for possible errors or try to change the measurement units of the variables.