Search code examples
pythonmatlabscipy-optimize

Difference in result between fmin and fminsearch in Matlab and Python


My objective is to perform an Inverse Laplace Transform on some decay data (NMR T2 decay via CPMG). For that, we were provided with the CONTIN algorithm. This algorithm was adapted to Matlab by Iari-Gabriel Marino, and it works very well. I want to adapt this code into Python. The core of the problem is with scipy.optimize.fmin, which is not minimizing the mean square deviation (MSD) in any way similar to Matlab's fminsearch. The latter results in a good minimization, while the former doesn't.

I have gone through line by line of my adapted code in Python, and the original Matlab. I checked every matrix and every output. I used this to identify that the critical point is in fmin. I also tried scipy.optimize.minimize and other minimization algorithms, but none gave even remotely satisfactory results.

I have made two MWE, for Python and Matlab, to make it reproducible to all. The example data were obtained from the documentation of the matlab function. Apologies if this is long code, but I don't really know how to shorten it without sacrificing readability and clarity. I tried to have the lines match as closely as possible. I am using Python 3.7.3, scipy v1.3.0, numpy 1.16.2, Matlab R2018b, on Windows 8.1. It's a relatively recent Anaconda install (<2 months).

My code:

import numpy as np
from scipy.optimize import fmin
import matplotlib.pyplot as plt

def msd(g, y, A, alpha, R, w, constraints):
    """ msd: mean square deviation. This is the function to be minimized by fmin"""
    if 'zero_at_extremes' in constraints:
        g[0] = 0
        g[-1] = 0
    if 'g>0' in constraints:
        g = np.abs(g)

    r = np.diff(g, axis=0, n=2)
    yfit = A @ g
    # Sum of weighted square residuals
    VAR = np.sum(w * (y - yfit) ** 2)
    # Regularizor
    REG = alpha ** 2 * np.sum((r - R @ g) ** 2)
    # output to be minimized
    return VAR + REG

# Objective: match this distribution
g0 = np.array([0, 0, 10.1625, 25.1974, 21.8711, 1.6377, 7.3895, 8.736, 1.4256, 0, 0]).reshape((-1, 1))
s0 = np.logspace(-3, 6, len(g0)).reshape((-1, 1))
t = np.linspace(0.01, 500, 100).reshape((-1, 1))
sM, tM = np.meshgrid(s0, t)
A = np.exp(-tM / sM)
np.random.seed(1)
# Creates data from the initial distribution with some random noise.
data = (A @ g0) + 0.07 * np.random.rand(t.size).reshape((-1, 1))

# Parameters and function start
alpha = 1E-2  # regularization parameter
s = np.logspace(-3, 6, 20).reshape((-1, 1)) # x of the ILT
g0 = np.ones(s.size).reshape((-1, 1))        # guess of y of ILT
y = data                                    # noisy data
options = {'maxiter':1e8, 'maxfun':1e8}     # for the fmin function
constraints=['g>0', 'zero_at_extremes']     # constraints for the MSD function
R=np.zeros((len(g0) - 2, len(g0)), order='F')  # Regularizor
w=np.ones(y.reshape(-1, 1).size).reshape((-1, 1)) # Weights
sM, tM = np.meshgrid(s, t, indexing='xy')
A = np.exp(-tM/sM)
g0 = g0 * y.sum() / (A @ g0).sum()  # Makes a "better guess" for the distribution, according to algorithm

print('msd of input data:\n', msd(g0, y, A, alpha, R, w, constraints))

for i in range(5):  # Just for testing. If this is extremely high, ~1000, it's still bad.
    g = fmin(func=msd,
             x0 = g0, 
             args=(y, A, alpha, R, w, constraints), 
             **options, 
             disp=True)[:, np.newaxis]
    msdfit = msd(g, y, A, alpha, R, w, constraints)
    if 'zero_at_extremes' in constraints:
            g[0] = 0
            g[-1] = 0
    if 'g>0' in constraints:
            g = np.abs(g)

    g0 = g

print('New guess', g)
print('Final msd of g', msdfit)

# Visualize the fit
plt.plot(s, g, label='Initial approximation')
plt.plot(np.logspace(-3, 6, 11), np.array([0, 0, 10.1625, 25.1974, 21.8711, 1.6377, 7.3895, 8.736, 1.4256, 0, 0]), label='Distribution to match')
plt.xscale('log')
plt.legend()
plt.show()

Matlab:

% Objective: match this distribution
g0 = [0 0 10.1625 25.1974 21.8711 1.6377 7.3895 8.736 1.4256 0 0]';
s0  = logspace(-3,6,length(g0))';
t = linspace(0.01,500,100)';
[sM,tM] = meshgrid(s0,t);
A = exp(-tM./sM);
rng(1);
% Creates data from the initial distribution with some random noise.
data = A*g0 + 0.07*rand(size(t));

% Parameters and function start
alpha = 1e-2; % regularization parameter
s = logspace(-3,6,20)'; % x of the ILT
g0 = ones(size(s)); % initial guess of y of ILT
y = data; % noisy data
options = optimset('MaxFunEvals',1e8,'MaxIter',1e8); % constraints for fminsearch
constraints = {'g>0','zero_at_the_extremes'}; % constraints for MSD
R = zeros(length(g0)-2,length(g0));
w = ones(size(y(:)));
[sM,tM] = meshgrid(s,t);
A = exp(-tM./sM);
g0 = g0*sum(y)/sum(A*g0); % Makes a "better guess" for the distribution

disp('msd of input data:')
disp(msd(g0, y, A, alpha, R, w, constraints))

for k = 1:5
    [g,msdfit] = fminsearch(@msd,g0,options,y,A,alpha,R,w,constraints);
    if ismember('zero_at_the_extremes',constraints)
        g(1) = 0;
        g(end) = 0;
    end
    if ismember('g>0',constraints)
        g = abs(g);
    end
    g0 = g;
end

disp('New guess')
disp(g)
disp('Final msd of g')
disp(msdfit)

% Visualize the fit
semilogx(s, g)
hold on
semilogx(logspace(-3,6,11), [0 0 10.1625 25.1974 21.8711 1.6377 7.3895 8.736 1.4256 0 0])
legend('First approximation', 'Distribution to match')
hold off
function out = msd(g,y,A,alpha,R,w,constraints)
% msd: The mean square deviation; this is the function
% that has to be minimized by fminsearch

% Constraints and any 'a priori' knowledge
if ismember('zero_at_the_extremes',constraints)
    g(1) = 0;
    g(end) = 0;
end
if ismember('g>0',constraints)
    g = abs(g); % must be g(i)>=0 for each i
end
r = diff(diff(g(1:end))); % second derivative of g
yfit = A*g;
% Sum of weighted square residuals
VAR = sum(w.*(y-yfit).^2);
% Regularizor
REG = alpha^2 * sum((r-R*g).^2);
% Output to be minimized
out = VAR+REG;
end

Here is the optimization in Python Here is the optimization in Python

Here is the optimization in Matlab Here is the optimization in Matlab

I have checked the output of MSD of g0 before starting, and both give the value of 2651. After minimization, Python goes up, to 4547, and Matlab goes down to 0.1381.

I think the problem is one of the following. It's in my implementation, that is, I am using fmin wrong, or there's some other passage I got wrong, but I can't figure out what. The fact the MSD increases when it should have decreased with a minimization function is damning. Reading the documentation, the scipy implementation is different from Matlab's (they use the Nelder Mead method described in Lagarias, per their documentation), while scipy uses the original Nelder Mead). Maybe that affects significantly? Or perhaps my initial guess is too bad for scipy's algorithm?


Solution

  • So, quite a long time since I posted this, but I wanted to share what I ended up learning and doing.

    The Inverse Laplace Transform for CPMG data is a bit of a misnomer, and it's more properly called just inversion. The general problem is solving a Fredholm integral of the first kind. One way of doing this is the Tikhonov regularization method. Turns out, you can describe this problem quite easily using numpy, and solve it with a scipy package, so I don't have to "reinvent" the wheel with this.

    I used the solution shown in this post, and the names here reflect that solution.

    def tikhonov_regularized_inversion(
        kernel: np.ndarray, alpha: float, data: np.ndarray
    ) -> np.ndarray:
        data = data.reshape(-1, 1)
        I = alpha * np.eye(*kernel.shape)
        C = np.concatenate([kernel, I], axis=0)
        d = np.concatenate([data, np.zeros_like(data)])
        x, _ = nnls(C, d.flatten())
    

    Here, kernel is a matrix containing all the possible exponential decay curves, and my solution judges the contribution of each decay curve in the data I received. First, I stack my data as a column, then pad it with zeros, creating the vector d. I then stack my kernel on top of a diagonal matrix containing the regularization parameter alpha along the diagonal, of the same size as the kernel. Last, I call the convenient nnls, a non negative least square solver in scipy.optimize. This is because there's no reason to have a negative contribution, only no contribution.

    This solved my problem, it's quick and convenient.