Search code examples
rpoissonprobability-density

Mixture Poisson distribution: mean and variance in R


So, I tried to simulate mixed Poisson distribution:

data2 <- data.frame(x = c(rpois(n = 50, lambda = 0), rpois(n = 450, lambda = 10)))

Then plot the histogram and density function upon it, for the distribution function I used function dmixpois from spatstat package. Here is the code for the plot:

ggplot(data2, aes(x = x)) +
  geom_histogram(aes(y = ..density..), bins = 15) +
  geom_line(aes(x = x, y = dmixpois(data2$x, mu = 9, sd = sqrt(41))))

Here is the plot: enter image description here

Clearly, the density function is wrong. As far as I know, the mean for mixed Poisson distribution is linear combination of the means for the singled distributions and the variance is E[lambda] + Var[lambda]. Here on the plot I only used the variance term, but if I add the expected value of lambda, I get the density to be even more steep. What is wrong with the computations?


Solution

  • The 'mixed Poisson' that you are simulating isn't the same as the mixed Poisson model in spatstat, which just assumes that the lambda of a Poisson distribution itself is a normally-distributed random variable. From the docs:

    In effect, the Poisson mean parameter lambda is randomised by setting lambda = invlink(Z) where Z has a Gaussian N(μ,σ²) distribution.

    It therefore won't simulate the mixing of two independent Poisson distributions.

    It looks like what you are simulating is a zero-inflated Poisson distribution, that is to say, a distribution where there is a probability of getting zero counts or a Poisson-distributed count for any given observation. There is a specific dzipois for this in the VGAM package.

    Remember also that a Poisson distribution is a discrete probability distribution, so you cannot properly show it with a continuous line, but rather only with points or spikes at the integer values.

    If you want to plot a distribution that matches your simulation, you can try the following:

    set.seed(1) # To make the example reproducible
    
    data2 <- data.frame(x = c(rpois(n = 50, lambda = 0), 
                              rpois(n = 450, lambda = 10)))
    
    
    ggplot(data2) +
      geom_histogram(aes(x = x, y = ..density..), breaks = 0:21 - 0.5) +
      geom_point(data = data.frame(x =0:20, y = VGAM::dzipois(0:20, 10, 1/10)),
                 aes(x, y))
    

    enter image description here