python optimization scipy newtons-method finite-difference

Where does SciPy's adaptive step size method for finite differences originate?

Inside the KrylovJacobian class from SciPy, there is this method:

def _update_diff_step(self):
    mx = abs(self.x0).max()
    mf = abs(self.f0).max()
    self.omega = self.rdiff * max(1, mx) / max(1, mf)

which would be the same as:

This modifies the step size that the finite difference method uses, however, I cannot find the origin of this expression, or why it works.

Does anybody know the source of this method or the reasoning behind it?

Solution

Reading the journal articles which SciPy cites on the documentation page,* I cannot find any choice of omega which is exactly equivalent to what SciPy is doing. However, there are a couple of cases which are similar.

High-level rationale

Does anybody know the source of this method or the reasoning behind it?

Reading D.A. Knoll and D.E. Keyes, J. Comp. Phys. 193, 357 (2004). DOI:10.1016/j.jcp.2003.08.010, one of the article SciPy cites, I found a high-level rationale for SciPy's choices.

As shown above, the Jacobian-vector product approximation is based on a Taylor series expansion. Here, we discuss various options for choosing the perturbation parameter, ε in Eq. (10), [Editor's note: this is the variable which SciPy calls omega, divided by the norm of v.] which is obviously sensitive to scaling, given u and v. If ε is too large, the derivative is poorly approximated and if it is too small the result of the finite difference is contaminated by floating-point roundoff error. The best ε to use for a scalar finite-difference of a single argument can be accurately optimized as a balance of these two quantifiable trade-offs. However, the choice of ε for the vector finite difference is as much of an art as a science.

So there are two concerns being balanced here:

Change of slope: If the step size is big, then the jacobian of the function may change between f(x) and f(x + step).
Roundoff: If x is big, then the step size must also be big, or there will be roundoff error when x + step is computed.

Ideally, to address the first concern, you would look at the second derivative of the function. However, we don't know the first or second derivative of the function. That's the whole point of finding the step size. I think that it is looking at the size of f(x) as the next best thing: if f(x) is big, then either the user put in a really bad guess for x when they started the solver, or this is an area of the function where the function changes rapidly.

Roundoff is addressed similarly, where if x is big, then the step will be big as well.

Comparison to existing approaches

SciPy's method is most similar to equation (11), from the same paper.

In this equation, n represents the number of dimensions of the problem, v represents the point where we are trying to find the Jacobian-vector product, u represents the direction of the product, and b represents an arbitrary constant which is approximately the square root of machine epsilon. (Note: this is similar to self.rdiff, which defaults to the square root of machine epsilon.)

We can algebraically manipulate this to find the similarities and differences between this and SciPy's formula.

# difference between SciPy's omega and Knoll's epsilon
epsilon = omega / norm(v)
epsilon = 1/(n*norm(v)) * (sum(b * abs(u[i]) for i in range(n)) + b)

# Combine the two equations
omega / norm(v) = 1/(n*norm(v)) * (sum(b * abs(u[i]) for i in range(n)) + b)

# Multiply both sides by norm of v
omega = 1/n * (sum(b * abs(u[i]) for i in range(n)) + b)

# Factor out b
omega = b/n * (sum(abs(u[i]) for i in range(n)) + 1)

# Move n into parens
omega = b * (sum(abs(u[i]) for i in range(n))/n + 1/n)

# Recognize sum(...)/n as mean
omega = b * (mean(abs(u)) + 1/n)

This is somewhat similar to what SciPy is doing, except:

It uses mean instead of max.
It avoids taking a step of zero size by adding 1/n rather than using max(1, ...).
It doesn't adjust for f(x).

Avoiding infinite or zero step sizes

There are two step sizes which must be avoided at all costs: zero and infinity. If one takes a step of zero, then this will cause a division by zero. A step size of infinity will tell us nothing about the local Jacobian.

This is a problem, because both x and f(x) can be zero. The max(1, ...) step is likely placed there to avoid this.

Conclusion

I can't find a pre-existing journal paper which takes the same approach. I suspect that this equation is just an approach which is experimentally justified and works in practice.

*Note: I only read the papers by D.A. Knoll and D.E. Keyes, and A.H. Baker and E.R. Jessup and T. Manteuffel. The first reference on that page was added after omega was chosen, so I did not read it.