ZIP - Hidden Markov model r Stan

I'm trying to adjust a Zero Inflated Poisson Hidden Markov Model with Stan. For the Poisson-HMM in a past forum this setting was shown. see link.

While to adjust the ZIP with the classical theory is well documented the code and model.

ziphsmm

library(ziphsmm)
set.seed(123)
prior_init <- c(0.5,0.5)
emit_init <- c(20,6)
zero_init <- c(0.5,0)
tpm <- matrix(c(0.9, 0.1, 0.2, 0.8),2,2,byrow=TRUE)
result <- hmmsim(n=100,M=2,prior=prior_init, tpm_parm=tpm,emit_parm=emit_init,zeroprop=zero_init)
y <- result$series
serie <- data.frame(y = result$series, m = result$state)

fit1 <-  fasthmmfit(y,x=NULL,ntimes=NULL,M=2,prior_init,tpm,
                    emit_init,0.5, hessian=FALSE,method="BFGS", 
                    control=list(trace=1))
fit1
$prior
            [,1]
[1,] 0.997497445
[2,] 0.002502555

$tpm
          [,1]       [,2]
[1,] 0.9264945 0.07350553
[2,] 0.3303533 0.66964673

$zeroprop
[1] 0.6342182

$emit
          [,1]
[1,] 20.384688
[2,]  7.365498

$working_parm
[1] -5.9879373 -2.5340475  0.7065877  0.5503559  3.0147840  1.9968067

$negloglik
[1] 208.823

Stan

library(rstan)

ZIPHMM <- 'data {
    int<lower=0> N;
    int<lower=0> y[N];
    int<lower=1> m;
  }

parameters {
    real<lower=0, upper=1> theta; //
    positive_ordered[m] lambda; //
    simplex[m] Gamma[m]; // tpm
  }

model {
  vector[m] log_Gamma_tr[m];
  vector[m] lp;
  vector[m] lp_p1;

  // priors
  lambda ~ gamma(0.1,0.01);
  theta ~ beta(0.05, 0.05);

  // transposing tpm and taking the log of each entry
  for(i in 1:m)
    for(j in 1:m)
      log_Gamma_tr[j, i] = log(Gamma[i, j]);

  lp = rep_vector(-log(m), m); // 

    for(n in 1:N) {
      for(j in 1:m){
        if (y[n] == 0)
          lp_p1[j] = log_sum_exp(log_Gamma_tr[j] + lp) +
                     log_sum_exp(bernoulli_lpmf(1 | theta),
                     bernoulli_lpmf(0 | theta) + poisson_lpmf(y[n] | lambda[j]));
        else
          lp_p1[j] = log_sum_exp(log_Gamma_tr[j] + lp) + 
                     bernoulli_lpmf(0 | theta) + 
                     poisson_lpmf(y[n] | lambda[j]);
                   }
      lp = lp_p1;
                  }
    target += log_sum_exp(lp);
}'
mod_ZIP <- stan(model_code = ZIPHMM, data=list(N=length(y), y=y, m=2), iter=1000, chains=1)
print(mod_ZIP,digits_summary = 3)
               mean se_mean    sd     2.5%      25%      50%      75%    97.5% n_eff  Rhat
theta         0.518   0.002 0.052    0.417    0.484    0.518    0.554    0.621   568 0.998
lambda[1]     7.620   0.039 0.787    6.190    7.038    7.619    8.194    9.132   404 1.005
lambda[2]    20.544   0.039 0.957   18.861   19.891   20.500   21.189   22.611   614 1.005
Gamma[1,1]    0.664   0.004 0.094    0.473    0.604    0.669    0.730    0.841   541 0.998
Gamma[1,2]    0.336   0.004 0.094    0.159    0.270    0.331    0.396    0.527   541 0.998
Gamma[2,1]    0.163   0.003 0.066    0.057    0.114    0.159    0.201    0.312   522 0.999
Gamma[2,2]    0.837   0.003 0.066    0.688    0.799    0.841    0.886    0.943   522 0.999
lp__       -222.870   0.133 1.683 -227.154 -223.760 -222.469 -221.691 -220.689   161 0.999

True values

real = list(tpm = tpm, 
     zeroprop = nrow(serie[serie$m == 1 & serie$y == 0, ]) / nrow(serie[serie$m == 1,]),
     emit = t(t(tapply(serie$y[serie$y != 0],serie$m[serie$y != 0], mean))))
real
$tpm
     [,1] [,2]
[1,]  0.9  0.1
[2,]  0.2  0.8

$zeroprop
[1] 0.6341463

$emit
       [,1]
1 20.433333
2  7.277778

Estimates give quite oddly to someone could help me to know that I am doing wrong. As we see the estimates of stan zeroprop = 0.518 while the real value is 0.634, on the other hand the values of the t.p.m. in stan they are quite distant and the means lambda1 = 7.62 and lambda2 = 20.54 although they approximate enough gave in different order to the real 20.43 and 7.27. I think I'm making some mistake in defining the model in Stan but I do not know which.

Solution

Although I don't know the inner workings of the ZIP-HMM fitting algorithm, there are some obvious differences in what you have implemented in the Stan model and how the ZIP-HMM optimization algorithm describes itself. Addressing these appears to be sufficient to generate similar results.

Differences Between the Models

Initial State Probability

The values that the ZIP-HMM estimates, specifically fit1$prior, indicate that it includes an ability to learn a probability for initial state. However, in the Stan model, this is fixed to 1:1

lp = rep_vector(-log(m), m);

This should be changed to allow the model to estimate an initial state.

Priors on Parameters (optional)

The Stan model has non-flat priors on lambda and theta, but presumably the ZIP-HMM is not weighting the specific values it arrives. If one wanted to more realistically mimic the ZIP-HMM, then flat priors would be better. However, the ability to have non-flat priors in Stan is really an opportunity to develop a more well-tuned model than is achievable with standard HMM inference algorithms.

Zero-Inflation on State 1

From the documentation of the fasthmmfit method

Fast gradient descent / stochastic gradient descent algorithm to learn the parameters in a specialized zero-inflated hidden Markov model, where zero-inflation only happens in State 1. [emphasis added]

The Stan model assumes zero-inflation on all states. This is likely why the estimated theta value is deflated relative to the ZIP-HMM MAP estimate.

State Ordering

When estimating discrete latent states or clusters in Stan, one can use an ordered vector as a trick to mitigate against label switching issues. This is effectively achieved here with

positive_ordered[m] lambda;

However, since the ZIP-HMM only has zero-inflation on the first state, correctly implementing this behavior in Stan requires prior knowledge of what the rank of the lambda is for the "first" state. This seems very problematic for generalizing this code. For now, let's just move forward under the assumption that we can always recover this information somehow. In this specific case, we will assume that state 1 in the HMM has the higher lambda value, and therefore will be state 2 in the Stan model.

Updated Stan Model

Incorporating the above changes in the model should be something like

Stan Model

data {
  int<lower=0> N;    // length of chain
  int<lower=0> y[N]; // emissions
  int<lower=1> m;    // num states
}

parameters {
  simplex[m] start_pos;         // initial pos probs
  real<lower=0, upper=1> theta; // zero-inflation parameter
  positive_ordered[m] lambda;   // emission poisson params
  simplex[m] Gamma[m];          // transition prob matrix
}

model {
  vector[m] log_Gamma_tr[m];
  vector[m] lp;
  vector[m] lp_p1;

  // transposing tpm and taking the log of each entry
  for (i in 1:m) {
    for (j in 1:m) { 
      log_Gamma_tr[j, i] = log(Gamma[i, j]);
    }
  }

  // initial position log-lik
  lp = log(start_pos);

  for (n in 1:N) {
    for (j in 1:m) {
      // log-lik for state
      lp_p1[j] = log_sum_exp(log_Gamma_tr[j] + lp);

      // log-lik for emission
      if (j == 2) { // assuming only state 2 has zero-inflation
        if (y[n] == 0) {
          lp_p1[j] += log_mix(theta, 0, poisson_lpmf(0 | lambda[j]));
        } else {
          lp_p1[j] += log1m(theta) + poisson_lpmf(y[n] | lambda[j]);
        }
      } else {
        lp_p1[j] += poisson_lpmf(y[n] | lambda[j]);
      }
    }
    lp = lp_p1; // log-lik for next position
  }
  target += log_sum_exp(lp);
}

MAP Estimate

Loading the above as a string variable code.ZIPHMM, we first compile it and run a MAP estimate (since MAP estimation is going to behave most like the HMM fitting algorithm):

model.ZIPHMM <- stan_model(model_code=code.ZIPHMM)

// note the use of some initialization on the params,
// otherwise it can occasionally converge to strange extrema
map.ZIPHMM <- optimizing(model.ZIPHMM, algorithm="BFGS",
                         data=list(N=length(y), y=y, m=2),
                         init=list(theta=0.5, lambda=c(5,10)))

Examining the estimated parameters

> map.ZIPHMM$par
start_pos[1] start_pos[2]        
9.872279e-07 9.999990e-01 

theta    
6.342449e-01 

lambda[1]    lambda[2]   
7.370525e+00 2.038363e+01 

Gamma[1,1]   Gamma[2,1]   Gamma[1,2]   Gamma[2,2] 
6.700871e-01 7.253215e-02 3.299129e-01 9.274678e-01

shows they closely reflect the values that fasthmmfit inferred, excepting that the state orders are switched.

Sampling the Posterior

This model can also be run with MCMC to infer a full posterior,

samples.ZIPHMM <- stan(model_code = code.ZIPHMM, 
                       data=list(N=length(y), y=y, m=2), 
                       iter=2000, chains=4)

which samples well and yields similar results (and without any parameter initializations)

> samples.ZIPHMM
Inference for Stan model: b29a2b7e93b53c78767aa4b0c11b62a0.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

                mean se_mean   sd    2.5%     25%     50%     75%   97.5% n_eff Rhat
start_pos[1]    0.45    0.00 0.29    0.02    0.20    0.43    0.69    0.97  6072    1
start_pos[2]    0.55    0.00 0.29    0.03    0.31    0.57    0.80    0.98  6072    1
theta           0.63    0.00 0.05    0.53    0.60    0.63    0.67    0.73  5710    1
lambda[1]       7.53    0.01 0.72    6.23    7.02    7.49    8.00    9.08  4036    1
lambda[2]      20.47    0.01 0.87   18.83   19.87   20.45   21.03   22.24  5964    1
Gamma[1,1]      0.65    0.00 0.11    0.43    0.57    0.65    0.72    0.84  5664    1
Gamma[1,2]      0.35    0.00 0.11    0.16    0.28    0.35    0.43    0.57  5664    1
Gamma[2,1]      0.08    0.00 0.03    0.03    0.06    0.08    0.10    0.16  5605    1
Gamma[2,2]      0.92    0.00 0.03    0.84    0.90    0.92    0.94    0.97  5605    1
lp__         -214.76    0.04 1.83 -219.21 -215.70 -214.43 -213.43 -212.25  1863    1