I'm trying to implement a HMM where the observations are the emissions from a first order HMM convolved with a wavelet.
That is:
With:
and
The code I have so far is, following along with the one dimensional case outlined here:
%%writefile HMM_code.stan
data {
int<lower=1> T; // number of observations (length)
int<lower=1> K; // number of hidden states
int<lower=1> H; // number of elements in wavelet vector
vector[K] dir_alpha;
real y[T]; // observations
matrix[T,H] W; // Wavelet matrix
cov_matrix[H] I; //prior covariance
}
parameters {
// Discrete state model
simplex[K] pi1; // initial state probabilities
simplex[K] A[K]; // transition probabilities
// A[i][j] = p(z_t = j | z_{t-1} = i)
// Continuous observation model
vector[H] Mu[K]; // observation means
real<lower=0> sigma; // observation standard deviations
cov_matrix[H] Sigma[K];
vector[T] eei[H]; //
}
transformed parameters {
vector[K] logalpha[T];
{ // Forward algorithm log p(z_t = j | y_{1:t})
real accumulator[K];
logalpha[1] = log(pi1) + multi_normal_lpdf(eei[1] | Mu[][1], Sigma[][1]);
for (t in 2:T) {
for (j in 1:K) { // j = current (t)
for (i in 1:K) { // i = previous (t-1)
// Murphy (2012) p. 609 eq. 17.48
// belief state + transition prob + local evidence at t
accumulator[i] = logalpha[t-1, i] + log(A[i, j]) + multi_normal_lpdf(eei[t] | Mu[][j], Sigma[][i]);
}
logalpha[t, j] = log_sum_exp(accumulator);
}
}
} // Forward
}
model{
Mu ~ multi_normal(rep_vector(0.0,H),I);
sigma ~ inv_gamma(.1,.1);
for(k in 1:K)
A[k] ~ dirichlet(dir_alpha);
for( j in 1:K)
Sigma[j] ~ inv_wishart(3.0, I); //Relevant part for this post
for (t in 1:T)
y[t] ~ normal(W[t,]*eei[t], pow(sigma,.5));// Likelihood
}
generated quantities {
vector[K] alpha[T];
vector[K] beta[T];
vector[K] gamma[T];
vector[K] loggamma[T];
int<lower=1, upper=K> zstar[T];
{ // Forward algortihm
for (t in 1:T)
alpha[t] = softmax(logalpha[t]);
} // Forward
{ // Backward algorithm log p(eei_{t+1:T} | z_t = j)
real accumulator[K];
vector[K] logbeta[T];
for (j in 1:K)
logbeta[T, j] = 1;
for (tforward in 0:(T-2)) {
int t;
t = T - tforward;
for (j in 1:K) { // j = previous (t-1)
for (i in 1:K) { // i = next (t)
// Murphy (2012) Eq. 17.58
// backwards t + transition prob + local evidence at t
accumulator[i] = logbeta[t, i] + log(A[j, i]) + multi_normal_lpdf(eei[t] | Mu[][i], Sigma[][i]);
}
logbeta[t-1, j] = log_sum_exp(accumulator);
}
}
for (t in 1:T)
beta[t] = softmax(logbeta[t]);
} // Backward
{ // forward-backward algorithm log p(z_t = j | eei_{1:T})
for(t in 1:T) {
loggamma[t] = alpha[t] .* beta[t];
}
for(t in 1:T)
gamma[t] = softmax(loggamma[t]);
} // forward-backward
{ // Viterbi algorithm
real logp_zstar;
int bpointer[T, K]; // backpointer to the most likely previous state on the most probable path
real delta[T, K]; // max prob for the sequence up to t
// that ends with an emission from state k
for (j in 1:K)
delta[1, K] = multi_normal_lpdf(eei[1] | Mu[][j], Sigma[][j]);
for (t in 2:T) {
for (j in 1:K) { // j = current (t)
delta[t, j] = negative_infinity();
for (i in 1:K) { // i = previous (t-1)
real logp;
logp = delta[t-1, i] + log(A[i, j]) + multi_normal_lpdf(eei[t] | Mu[][i], Sigma[][i]);
if (logp > delta[t, j]) {
bpointer[t, j] = i;
delta[t, j] = logp;
}
}
}
}
logp_zstar = max(delta[T]);
for (j in 1:K)
if (delta[T, j] == logp_zstar)
zstar[T] = j;
for (t in 1:(T - 1)) {
zstar[T - t] = bpointer[T - t + 1, zstar[T - t + 1]];
}
}
}
Compiled using:
%%time
sm = pystan.StanModel(file='HMM_code.stan')
And fit using:
%%time
fit = sm.sampling(data=HMM_data, iter=1, thin = 1, verbose = True)
The error I'm getting is in the first line evaluating the multi_normal_pdf function, saying that "the LDLT_Factor of covariance parameter is not positive definite. last conditional variance is 2.77556e-16"
Large covariance matrices can easily be numerically singular, even in this case where they are positive definite by construction. In order to avoid this problem and unnecessary computation, the usual thing to do in Stan is to declare the parameter as the Cholesky factor of a covariance or correlation matrix (in which case you also need to declare a positive vector of standard deviations in the parameters block).
In that case, you would call the multi_normal_cholesky_lpdf
function, which conditions on a Cholesky factor of a covariance matrix rather than the covariance matrix itself, which in turn can be a diag_pre_multiply
of a vector of standard deviations and the Cholesky factor of a correlation matrix. However, in that case, you would need to switch the priors from an inverse Wishart to a lkj_corr_cholesky_lpdf
for the Cholesky factor of the correlation matrix and whatever you want (such as exponential_lpdf
) for the standard deviations. This is all discussed in the Stan user manual.
Another option would be to integrate out the covariance matrices.