I try to sample from the following mixture models of gamma distribution:
The R code is as follows:
The algorithm can be translated into a vectorized approach.
Step 1: Generate a random sample k_1,...,k_n
of integers in a vector k
,where P(k)=θ_k
, k=1,...,5.
Step 2: Set rate=1/k.
n <- 5000
k <- sample(1:5, size=n, replace=TRUE, prob=(1:5)/15)
rate <- 1/k
x <- rgamma(n, shape=3, rate=rate)
My question is why x
is now the mixture of these five gamma distributions? In the expression of the mixture model, it seems that we also need coefficient theta_k
?
Here are two ways to compare samples from a Gamma mixture distribution with the expected mixture density. This should help understand how F_X is the cumulative distribution function of a mixture of Gamma distributions.
# Fix random seed for reproducibility
set.seed(2022)
# Sample data
n <- 100000
X <- unlist(lapply(1:5, function(j) rgamma(n * j, shape = 3, rate = 1 / j)))
# Weighted Gamma mixture density
dmix <- function(x) sapply(
x,
function(val) sum((1:5) / 15 * dgamma(val, shape = 3, rate = 1 / (1:5))))
library(tibble)
data = tibble(x = seq(0, ceiling(max(X)), length.out = 100), y = dmix(x))
# Plot histogram of samples and compare with density
library(ggplot2)
ggplot(data.frame(x = X), aes(x)) +
geom_histogram(aes(y = ..density..), bins = 200) +
geom_line(data = data, aes(x, y))
Comments:
1:5
.aes(y = ..density..)
to express the histogram as a properly normalised density so that we can compare values with the mixture density dmix
.