Search code examples
rhidden-markov-modelsmultinomial

multinomial-hmm using R


I have the following kind of data:

df <- data.frame(id = rep(1, each = 40), 
                        year = seq(1961,2000),
                        x1 = rbinom(40, size = 1, prob = 1 - 0.6) * rpois(40, lambda = 4),
                        X2 = rbinom(40, size = 1, prob = 1 - 0.7) * rpois(40, lambda = 4),
                        X3 = rbinom(40, size = 1, prob = 1 - 0.6) * rpois(40, lambda = 5),
                        X4 = rbinom(40, size = 1, prob = 1 - 0.7) * rpois(40, lambda = 6))

As you can see in my data there are four count variables. I want to estimate a multinomial-HMM, as I expect that there is a latent variable, C, with 3 possible states that affect Pr(X_t=xt) (each vector X_t is assumed to be mutually independent conditional on the Markov chain C_t). For example, I expect that if C_t=1 we would observe a vector of X_t more like this (4,1,0,0), if C_T=2 a vector of X_t more like this (0,1,1,0) and if C=3 it is more likely to observe a vector of X_t like this (0,0,1,5).

I haven't found a package able to estimate this type of model, so currently, I am using the depmixS4 package.

library(depmixS4)
mod<-depmix(list(X1 ~ 1, X2 ~ 1, X3 ~ 1, X4 ~ 1), data=df, nstates=3,
                  family=list(poisson(), poisson(), poisson(),poison()))

However, I am not sure this would be a correct model according to my theoretical expectations. Can I use depmix differently to be more suitable to my model?


Solution

  • You could simply use a multinomial distribution for the response. I'm assuming that you mean to let X1, ..., X4 refer to four levels of a single categorical variable? And each of these variables then contains the count of a level in a particular year? One option is then to fit the following model:

    > library(depmixS4)
    Loading required package: nnet
    Loading required package: MASS
    Loading required package: Rsolnp
    Loading required package: nlme
    > set.seed(1)
    > df <- data.frame(id = rep(1, each = 40), 
    +                  year = seq(1961,2000),
    +                  X1 = rbinom(40, size = 1, prob = 1 - 0.6) * rpois(40, lambda = 4),
    +                  X2 = rbinom(40, size = 1, prob = 1 - 0.7) * rpois(40, lambda = 4),
    +                  X3 = rbinom(40, size = 1, prob = 1 - 0.6) * rpois(40, lambda = 5),
    +                  X4 = rbinom(40, size = 1, prob = 1 - 0.7) * rpois(40, lambda = 6))
    > # matrix for single multinomial response variable
    > X <- as.matrix(df[,c("X1", "X2", "X3", "X4")])
    > 
    > # formulate model
    > mod<-depmix(X ~ 1, data=df, nstates=3,
    +             family=multinomial("identity"))
    > 
    > # fit model
    > fmod <- fit(mod)
    converged at iteration 22 with logLik: -161.5714 
    > 
    > # show results
    > summary(fmod)
    Initial state probabilities model 
    pr1 pr2 pr3 
      0   0   1 
    
    Transition matrix 
            toS1  toS2  toS3
    fromS1 0.290 0.355 0.356
    fromS2 0.132 0.328 0.540
    fromS3 0.542 0.191 0.267
    
    Response parameters 
    Resp 1 : multinomial 
        Re1.pr1 Re1.pr2 Re1.pr3 Re1.pr4
    St1   0.092    0.56   0.288   0.061
    St2   0.033    0.00   0.423   0.544
    St3   0.608    0.00   0.392   0.000