How to model a mixture of finite components from different parametric families with JAGS?

Imagine a underlying process that draws a number from a normal distribution with probability $\alpha$ and from a uniform distribution with probability $1 - \alpha$. The observed sequence of numbers generated by this process therefore follows a distribution $f$ that is a mixture of 2 components and mixing weights of $\alpha$ and $1 - \alpha$. How would you model this kind of mixture with JAGS when the observed sequence is the only input, but the parametric families are known?

Example (in R):

set.seed(8361299)
N <- 100
alpha <- 0.3
mu <- 5
max <- 50
# Which component to choose from?
latent_class <- rbinom(N, 1, alpha)
Y <- ifelse(latent_class, runif(N, min=mu, max=max), rnorm(N, mean=mu))

The generated (observed) Y looks like:

With JAGS, it should be possible to obtain the mixing weights, as well as the parameters of the known components?

Solution

Mixture models of the same parametric distribution are pretty straightforward in JAGS/BUGS, but mixture models with varying parametric responses (like yours) are a little more tricky. One method is to use the 'ones trick' whereby we manually calculate the likelihood of the response (selecting one of the two distributions as specified by the latent part of the model) and fit this to the (fake) response of a Bernoulli trial for each data point. For example:

# Your data generation:
set.seed(8361299)
N <- 100
alpha <- 0.3
mu <- 5
max <- 50
# Which component to choose from?
latent_class <- rbinom(N, 1, alpha)
Y <- ifelse(latent_class, runif(N, min=mu, max=max), rnorm(N, mean=mu))

# The model:
model <- "model{

    for(i in 1:N){

        # Log density for the normal part:
        ld_norm[i] <- logdensity.norm(Y[i], mu, tau)

        # Log density for the uniform part:
        ld_unif[i] <- logdensity.unif(Y[i], lower, upper)

        # Select one of these two densities:
        density[i] <- exp(ld_norm[i]*norm_chosen[i] + ld_unif[i]*(1-norm_chosen[i]))

        # Generate a likelihood for the MCMC sampler:
        Ones[i] ~ dbern(density[i])

        # The latent class part as usual:
        norm_chosen[i] ~ dbern(prob)

    }

    # Priors:
    lower ~ dnorm(0, 10^-6)
    upper ~ dnorm(0, 10^-6)
    prob ~ dbeta(1,1)
    mu ~ dnorm(0, 10^-6)
    tau ~ dgamma(0.01, 0.01)

    # Specify monitors, data and initial values using runjags:
    #monitor# lower, upper, prob, mu, tau
    #data# N, Y, Ones
    #inits# lower, upper
}"

# Run the model using runjags (or use rjags if you prefer!)
library('runjags')

lower <- min(Y)-10
upper <- max(Y)+10

Ones <- rep(1,N)

results <- run.jags(model, sample=20000, thin=1)
results

plot(results)

This seems to recover your parameters pretty well (your alpha is 1-prob), but watch out for autocorrelation (and convergence).

Matt

EDIT: Since you asked about generalising to more than 2 distributions, here is equivalent (but more generalisable) code:

# The model:
model <- "model{
    for(i in 1:N){
        # Log density for the normal part:
        ld_comp[i, 1] <- logdensity.norm(Y[i], mu, tau)
        # Log density for the uniform part:
        ld_comp[i, 2] <- logdensity.unif(Y[i], lower, upper)
        # Select one of these two densities and normalise with a Constant:
        density[i] <- exp(ld_comp[i, component_chosen[i]] - Constant)
        # Generate a likelihood for the MCMC sampler:
        Ones[i] ~ dbern(density[i])
        # The latent class part using dcat:
        component_chosen[i] ~ dcat(probs)
    }
    # Priors for 2 parameters using a dirichlet distribution:
    probs ~ ddirch(c(1,1))
    lower ~ dnorm(0, 10^-6)
    upper ~ dnorm(0, 10^-6)
    mu ~ dnorm(0, 10^-6)
    tau ~ dgamma(0.01, 0.01)
    # Specify monitors, data and initial values using runjags:
    #monitor# lower, upper, probs, mu, tau
    #data# N, Y, Ones, Constant
    #inits# lower, upper, mu, tau
}"

library('runjags')

# Initial values to get the chains started:
lower <- min(Y)-10
upper <- max(Y)+10
mu <- 0
tau <- 0.01

Ones <- rep(1,N)

# The constant needs to be big enough to avoid any densities >1 but also small enough to calculate probabilities for observations of 1:
Constant <- 10

results <- run.jags(model, sample=10000, thin=1)
results

This code will work for as many distributions as you need, but expect exponentially worse autocorrelation with more distributions.