I have the following kind of data:
df <- data.frame(id = rep(1, each = 40),
year = seq(1961,2000),
x1 = rbinom(40, size = 1, prob = 1 - 0.6) * rpois(40, lambda = 4),
X2 = rbinom(40, size = 1, prob = 1 - 0.7) * rpois(40, lambda = 4),
X3 = rbinom(40, size = 1, prob = 1 - 0.6) * rpois(40, lambda = 5),
X4 = rbinom(40, size = 1, prob = 1 - 0.7) * rpois(40, lambda = 6))
As you can see in my data there are four count variables. I want to estimate a multinomial-HMM, as I expect that there is a latent variable, C, with 3 possible states that affect Pr(X_t=xt) (each vector X_t is assumed to be mutually independent conditional on the Markov chain C_t). For example, I expect that if C_t=1 we would observe a vector of X_t more like this (4,1,0,0), if C_T=2 a vector of X_t more like this (0,1,1,0) and if C=3 it is more likely to observe a vector of X_t like this (0,0,1,5).
I haven't found a package able to estimate this type of model, so currently, I am using the depmixS4 package.
library(depmixS4)
mod<-depmix(list(X1 ~ 1, X2 ~ 1, X3 ~ 1, X4 ~ 1), data=df, nstates=3,
family=list(poisson(), poisson(), poisson(),poison()))
However, I am not sure this would be a correct model according to my theoretical expectations. Can I use depmix differently to be more suitable to my model?
You could simply use a multinomial distribution for the response. I'm assuming that you mean to let X1
, ..., X4
refer to four levels of a single categorical variable? And each of these variables then contains the count of a level in a particular year? One option is then to fit the following model:
> library(depmixS4)
Loading required package: nnet
Loading required package: MASS
Loading required package: Rsolnp
Loading required package: nlme
> set.seed(1)
> df <- data.frame(id = rep(1, each = 40),
+ year = seq(1961,2000),
+ X1 = rbinom(40, size = 1, prob = 1 - 0.6) * rpois(40, lambda = 4),
+ X2 = rbinom(40, size = 1, prob = 1 - 0.7) * rpois(40, lambda = 4),
+ X3 = rbinom(40, size = 1, prob = 1 - 0.6) * rpois(40, lambda = 5),
+ X4 = rbinom(40, size = 1, prob = 1 - 0.7) * rpois(40, lambda = 6))
> # matrix for single multinomial response variable
> X <- as.matrix(df[,c("X1", "X2", "X3", "X4")])
>
> # formulate model
> mod<-depmix(X ~ 1, data=df, nstates=3,
+ family=multinomial("identity"))
>
> # fit model
> fmod <- fit(mod)
converged at iteration 22 with logLik: -161.5714
>
> # show results
> summary(fmod)
Initial state probabilities model
pr1 pr2 pr3
0 0 1
Transition matrix
toS1 toS2 toS3
fromS1 0.290 0.355 0.356
fromS2 0.132 0.328 0.540
fromS3 0.542 0.191 0.267
Response parameters
Resp 1 : multinomial
Re1.pr1 Re1.pr2 Re1.pr3 Re1.pr4
St1 0.092 0.56 0.288 0.061
St2 0.033 0.00 0.423 0.544
St3 0.608 0.00 0.392 0.000