GMM by fitgmdist in MATLAB gives different results when running all iterations at once or iteratively

I am designing a new algorithm which needs to partially run EM algorithm. I am using MATLAB's (R2015b) fitgmdist for this.

I observe a difference between the solutions obtained by (1) running a large number of iterations at once and (2) running the same number of iterations one by one. Note that both versions use same starting point, number of replicates is left to 1 (default), the RegularizationValue is left to default (0).

So where does the difference come from?

Here is the code that demonstrates the issue:

mu1 = [1 1];
Sigma1 = [2 0; 0 0.5];
mu2 = [1 -1];
Sigma2 = [1 0;0 1];
rng(20); % For reproducibility
X = [mvnrnd(mu1,Sigma1,1000);mvnrnd(mu2,Sigma2,1000)];

start = [];
start.mu = [X(randi(size(X,1)),:); X(randi(size(X,1)),:)];
start.Sigma = [];
start.Sigma = cat(3, start.Sigma, Sigma1+rand(1));
start.Sigma = cat(3, start.Sigma, Sigma2+rand(1));

% run 100 iterations
GMModel = fitgmdist(X,2,'Options',statset('MaxIter',100),'Start',start);

% now run 100 iterations one by one
for i=1:100
    GMModel2 = fitgmdist(X,2,'Options',statset('MaxIter',1),'Start',start);
    % set the start to result after 1 iteration
    start.mu = GMModel2.mu;
    start.Sigma = GMModel2.Sigma;
    start.ComponentProportion = GMModel2.ComponentProportion;
end


GMModel

% GMModel = 
% 
% Gaussian mixture distribution with 2 components in 2 dimensions
% Component 1:
% Mixing proportion: 0.470964
% Mean:       0.9345       0.9932
% 
% Component 2:
% Mixing proportion: 0.529036
% Mean:       1.0809      -0.8807



GMModel2

% GMModel2 = 
% 
% Gaussian mixture distribution with 2 components in 2 dimensions
% Component 1:
% Mixing proportion: 0.481425
% Mean:      0.93994      0.98135
% 
% Component 2:
% Mixing proportion: 0.518575
% Mean:       1.0788     -0.90749

EDIT: One thing that I did not check before was the number of iterations used for GMModel (when MaxIter set to 100). It stopped after 74 iterations.

GMModel.NumIterations
%ans =
%    74

The negative log likelihood at iteration 74 for one iteration at a time is same as that of 100 MaxIter. At iteration 75 it drops by ~0.006. So another question arises, why did the MaxIter 100 version stop at iteration 74 when the log-likelihood drops more than the tolerance which is 1e-6?

Solution

The stopping is likely related to the convergence check in MATLAB/R201Xy/toolbox/stats/stats/@gmdistribution/private/gmcluster.m about halfway through in gmcluster_learn:

%check if it converges
llDiff = ll-ll_old;
if llDiff >= 0 && llDiff < options.TolFun *abs(ll)                                                                                                                    
    optimInfo.Converged=true;
    break;
end
ll_old = ll;

where ll is set via [ll,post] = estep(log_lh);, but near the top of the function it sets

ll_old = -inf;

so when you run all at once, llDiff shrinks over the iterations but when you run one by one, it remains large and the convergence check always fails.