Search code examples
matlabmle

mle memory error with custom negative log-likelihood function


I am trying to use 'mle' with a custom negative log-likelihood function, but I get the following error:

Requested 1200000x1200000 (10728.8GB) array exceeds maximum array size preference (15.6GB). This might cause MATLAB to become unresponsive.

The data I am using is a 1x1200000 binary array (which I had to convert to double), and the function has 10 arguments: one for the data, 3 known paramenters, and 6 to be optimized. I tried setting 'OptimFun' to both 'fminsearch' and 'fmincon'. Also, optimizing the parameters using 'fminsearch' and 'fminunc' instead of 'mle' works fine.
The problem happens in the 'checkFunErrs' functions, inside the 'mlecustom.m' file (call at line 173, actuall error at line 705).
With 'fminunc' I could calculate the optimal parameters, but it does not give me confidence intervals. Is there a way to circumvent this? Or am I doing something wrong?
Thanks for the help.

T_1 = 50000;
T_2 = 100000;
npast = 10000;
start = [0 0 0 0 0 0];
func = @(x, data, cens, freq)loglike(data, [x(1) x(2) x(3) x(4) x(5) x(6)],...
                        T_1, T_2, npast);
params = mle(data, 'nloglf', func, 'Start', start, 'OptimFun', 'fmincon');
% Computes the negative log likehood
function out = loglike(data, params, T_1, T_2, npast)
    size = length(data);
    if npast == 0
        past = 0;
    else
        past = zeros(1, size);
        past(npast+1:end) = movmean(data(npast:end-1),[npast-1, 0]); % Average number of events in the previous n years
    end
    lambda = params(1) + ...
        (params(2)*cos(2*pi*(1:size)/T_1)) + ...
        (params(3)*sin(2*pi*(1:size)/T_1)) + ...
        (params(4)*cos(2*pi*(1:size)/T_2)) + ...
        (params(5)*sin(2*pi*(1:size)/T_2)) + ...
        params(6)*past;
    out = sum(log(1+exp(lambda))-data.*lambda);
end

Solution

  • Your issue is line 228 (as of MATLAB R2017b) of the in-built mle function, which happens just before the custom function is called:

    data = data(:);
    

    The input variable data is converted to a column array without warning. This is typically done to ensure that all further calculations are robust to the orientation of the input vector.

    However, this is causing you issues, because your custom function assumes data is a row vector, specifically this line:

    out = sum(log(1+exp(lambda))-data.*lambda);
    

    Due to implicit expansion, when the row vector lambda and the column vector data interact, you get a huge square matrix per your error message.

    Adding these two lines to make it explicit that both are column vectors resolves the issue, avoids implicit expansion, and applies the calculation element-wise as you intended.

    lambda = lambda(:);
    data = data(:);
    

    So your function becomes

    function out = loglike(data, params, T_1, T_2, npast)
        N = length(data);
        if npast == 0
            past = 0;
        else
            past = zeros(1,N);
            past(npast+1:end) = movmean(data(npast:end-1),[npast-1, 0]); % Average number of events in the previous n years
        end
        lambda = params(1) + ...
            (params(2)*cos(2*pi*(1:N)/T_1)) + ...
            (params(3)*sin(2*pi*(1:N)/T_1)) + ...
            (params(4)*cos(2*pi*(1:N)/T_2)) + ...
            (params(5)*sin(2*pi*(1:N)/T_2)) + ...
            params(6)*past;
        lambda = lambda(:);
        data = data(:);
        out = sum(log(1+exp(lambda))-data.*lambda);
    end
    

    An alternative would be to re-write your function so that it uses column vectors, but you create new row vectors with the (1:N) steps and the concatenation within the movmean. The suggested approach is arguably "lazier", but also robust to row or column inputs.

    Note also I've changed your variable name from size to N, since size is an in-built function which you should avoid shadowing.