Search code examples
rdistributionpoisson

Fitting zero inflated poisson to plot it in R


I have the following data

data<-c(1L, 4L, 5L, 10L, 13L, 8L, 3L, 5L, 13L, 9L, 5L, 10L, 9L, 4L, 
4L, 13L, 10L, 10L, 7L, 7L, 3L, 1L, 11L, 4L, 5L, 9L, 10L, 3L, 
2L, 7L, 8L, 4L, 5L, 6L, 3L, 4L, 13L, 7L, 8L, 6L, 5L, 3L, 10L, 
4L, 8L, 8L, 2L, 9L, 5L, 2L, 8L, 7L, 6L, 6L, 6L, 4L, 3L, 9L, 11L, 
6L, 7L, 7L, 3L, 4L, 18L, 14L, 8L, 9L, 5L, 3L, 7L, 3L, 8L, 3L, 
9L, 3L, 4L, 7L, 7L, 5L, 8L, 7L, 10L, 9L, 9L, 11L, 8L, 3L, 9L, 
10L, 11L, 9L, 12L, 13L, 9L, 15L, 11L, 13L, 3L, 24L, 11L, 13L, 
14L, 14L, 5L, 10L, 6L, 10L, 8L, 9L, 13L, 5L, 8L, 8L, 6L, 17L, 
11L, 11L, 8L, 2L, 14L, 6L, 1L, 7L, 5L, 3L, 12L, 6L, 10L, 7L, 
15L, 9L, 7L, 3L, 9L, 11L, 3L, 5L, 14L, 7L, 3L, 20L, 17L, 14L, 
7L, 11L, 11L, 2L, 4L, 9L, 5L, 10L, 7L, 10L, 13L, 7L, 18L, 13L, 
18L, 20L, 16L, 9L, 5L, 13L, 16L, 11L, 9L, 7L, 12L, 13L, 21L, 
9L, 7L, 13L, 4L, 7L, 5L, 13L, 19L, 17L, 8L, 7L, 4L, 18L, 14L, 
8L, 8L, 16L, 13L, 9L, 14L, 8L, 20L, 7L, 12L, 14L, 8L, 16L, 10L, 
9L, 20L, 5L, 7L, 8L, 16L, 11L, 10L, 12L, 20L, 5L, 2L, 21L, 16L, 
18L, 0L, 16L, 4L, 6L, 16L, 6L, 15L, 15L, 10L, 8L, 13L, 22L, 14L, 
5L, 8L, 11L, 14L, 7L, 9L, 7L, 7L, 8L, 5L, 12L, 6L, 20L, 10L, 
17L, 9L, 7L, 13L, 9L, 13L, 15L, 18L, 10L, 8L, 10L, 12L, 16L, 
16L, 11L, 13L, 8L, 8L, 20L, 16L, 11L, 14L, 18L, 10L, 8L, 17L, 
24L, 8L, 15L, 16L, 9L, 10L, 22L, 15L, 16L, 16L, 20L, 16L, 7L, 
12L, 10L, 16L, 16L, 17L, 16L, 13L, 4L, 14L, 14L, 18L, 11L, 4L, 
3L, 10L, 19L, 9L, 9L, 10L, 4L, 9L, 9L, 5L, 6L, 13L, 7L, 4L, 2L, 
7L, 13L, 6L, 4L, 3L, 6L, 5L, 2L, 9L, 6L, 10L, 9L, 3L, 2L, 7L, 
12L, 14L, 12L, 12L, 2L, 4L, 7L, 5L, 7L, 9L, 5L, 6L, 6L, 9L, 10L, 
6L, 11L, 4L, 6L, 3L, 5L, 3L, 5L, 4L, 10L, 7L, 4L, 6L, 9L, 11L, 
6L, 10L, 3L, 1L, 9L, 9L, 11L, 8L, 3L, 5L, 7L, 6L, 8L, 8L, 9L, 
4L, 2L, 5L, 7L, 13L, 6L, 12L, 3L, 9L, 7L, 4L, 6L, 8L, 11L, 9L, 
4L, 5L, 10L, 11L, 17L, 15L, 3L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 
0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 
0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 
0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 2L, 1L, 2L, 0L, 1L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
3L, 16L, 17L, 6L, 6L, 9L, 6L, 12L, 6L, 13L, 6L, 5L, 9L, 6L, 14L, 
2L, 17L, 4L, 10L, 6L, 1L, 15L, 8L, 8L, 5L, 7L, 7L, 8L, 12L, 2L, 
3L, 7L, 11L, 6L, 9L, 10L, 11L, 11L, 4L, 12L, 1L, 7L, 6L, 3L, 
8L, 11L, 7L, 6L, 5L, 5L, 11L, 7L, 7L, 6L, 7L, 5L, 7L, 10L, 5L, 
4L, 7L, 5L, 9L, 7L, 14L, 10L, 4L, 9L, 5L, 10L, 12L, 14L, 6L, 
5L, 12L, 5L, 3L, 8L, 8L, 4L, 9L, 9L, 12L, 2L, 8L, 5L, 4L, 5L, 
1L, 4L, 4L, 7L, 6L, 8L, 10L, 13L, 9L, 4L, 8L, 8L, 9L, 12L, 4L, 
7L, 6L, 5L, 5L, 7L, 2L, 5L, 10L, 0L, 4L, 6L, 5L, 3L, 8L, 2L, 
1L, 1L, 6L, 6L, 1L, 2L, 5L, 9L, 10L, 7L, 10L, 3L, 12L, 7L, 4L, 
1L, 5L, 6L, 6L, 5L, 4L, 1L, 5L, 0L, 8L, 6L, 4L, 1L, 7L, 5L, 3L, 
8L, 3L, 0L, 3L, 2L, 0L, 6L, 10L, 0L, 8L, 3L, 0L, 1L, 1L, 5L, 
7L, 0L, 1L, 0L, 3L, 1L, 9L, 2L, 8L, 1L, 0L, 0L, 5L, 1L, 0L, 2L, 
1L, 0L, 7L, 1L, 2L, 0L, 0L, 4L, 4L, 10L, 0L, 6L, 4L, 3L, 0L, 
4L, 1L, 3L, 1L, 0L, 0L, 0L, 5L, 0L, 6L, 6L, 3L, 5L, 0L, 4L, 0L, 
2L, 3L, 5L, 2L, 4L, 3L, 1L, 1L, 0L, 2L, 0L, 3L, 0L, 3L, 4L, 4L, 
7L, 0L, 0L, 1L, 9L, 0L, 3L, 0L, 4L, 0L, 3L, 4L, 5L, 0L, 0L, 4L, 
3L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 
0L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 1L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 1L, 
0L, 0L, 0L, 1L, 0L, 0L, 1L, 2L, 0L, 0L, 0L, 2L, 0L, 0L, 0L, 0L, 
1L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 2L, 0L, 0L, 
0L, 0L, 2L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 13L, 10L, 13L, 10L, 11L, 
8L, 27L, 8L, 12L, 20L, 15L, 9L, 10L, 3L, 8L, 13L, 16L, 13L, 12L, 
13L, 10L, 14L, 14L, 10L, 10L, 7L, 13L, 12L, 12L, 23L, 7L, 12L, 
6L, 7L, 10L, 8L, 13L, 16L, 10L, 11L, 18L, 7L, 15L, 18L, 10L, 
9L, 15L, 4L, 3L, 9L, 12L, 2L, 6L, 4L, 4L, 8L, 4L, 7L, 11L, 9L, 
7L, 9L, 15L, 7L, 7L, 14L, 15L, 6L, 3L, 7L, 6L, 22L, 7L, 8L, 6L, 
12L, 7L, 11L, 10L, 6L, 10L, 6L, 5L, 16L, 11L, 11L, 6L, 9L, 10L, 
4L, 14L, 7L, 6L, 4L, 9L, 4L, 7L, 10L, 11L, 8L, 6L, 7L, 3L, 8L, 
8L, 12L, 7L, 13L, 5L, 4L, 10L, 6L, 8L, 7L, 11L, 3L, 3L, 5L, 4L, 
4L, 11L, 3L, 3L, 3L, 3L, 7L, 4L, 5L, 3L, 5L, 1L, 5L, 2L, 5L, 
6L, 6L, 4L, 3L, 6L, 7L, 3L, 8L, 1L, 3L, 5L, 9L, 9L, 10L, 6L, 
9L, 7L, 5L, 5L, 10L, 6L, 9L, 2L, 6L, 6L, 1L, 6L, 4L, 5L, 3L, 
3L, 3L, 3L, 3L, 2L, 6L, 1L, 5L, 3L, 4L, 9L, 3L, 8L, 5L, 7L, 5L, 
10L, 5L, 4L, 0L, 8L, 6L, 4L, 6L, 7L, 4L, 3L, 1L, 3L, 3L, 6L, 
5L, 7L, 3L, 7L, 2L, 2L, 6L, 4L, 3L, 3L, 2L, 2L, 4L, 2L, 5L, 5L, 
7L, 3L, 5L, 2L, 2L, 1L, 5L, 1L, 3L, 2L, 5L, 3L, 1L, 4L, 0L, 1L, 
4L, 3L, 2L, 2L, 2L, 6L, 3L, 4L, 2L, 2L, 8L, 4L, 3L, 6L, 6L, 2L, 
4L, 11L, 3L, 4L, 4L, 5L, 5L, 1L, 5L, 2L, 7L, 3L, 2L, 4L, 2L, 
3L, 6L, 3L, 11L, 7L, 5L, 9L, 5L, 6L, 5L, 9L, 6L, 5L, 7L, 1L, 
14L, 7L, 7L, 7L, 2L, 5L, 5L, 9L, 2L, 9L, 2L, 6L, 2L, 9L, 4L, 
3L, 4L, 9L, 7L, 6L, 5L, 4L, 5L, 6L, 4L, 5L, 2L, 5L, 4L, 7L, 3L, 
9L, 6L, 9L, 7L, 2L, 7L, 6L, 7L, 3L, 4L, 8L, 3L, 8L, 10L, 3L, 
3L, 5L, 4L, 8L, 6L, 5L, 4L, 5L, 1L, 6L, 6L, 8L, 9L, 5L, 10L, 
1L, 8L, 7L, 7L, 6L, 5L, 1L, 5L, 8L, 11L, 2L, 6L, 7L, 6L, 5L, 
20L, 8L, 10L, 7L, 5L, 2L, 5L, 3L, 17L, 6L, 5L, 0L, 1L, 1L, 9L, 
1L)

I have run a ZINB model and I know that it is the best fit for my data. I want to demonstrate on a graph that this distribution is my best option. I am using fitdist

library(fitdistrplus)
library(gamlss)

nb<-fitdist(data, "nbinom")
pois<-fitdist(data, "pois")
zinb<-fitdist(data, 'ZANBI',start = list(mu = 4, sigma = 0.2))
par(mfrow = c(2, 2))
plot.legend <- c("Negative binomial", "Poisson", "ZINB")

My problem is that, just as I wanted to demonstrate that nbinom and pois are not the best fit, I can't do it with zero inflated poissonZIP.

I am using gamlss

zip<-fitdist(data, 'ZIP',start = list(mu = 7.09, sigma = 4.5))

Here I'm using the values suggested in here considering mean(data[data != 0]) and var(data[data != 0]). I always get:

Error in fitdist(data, "ZIP", start = list(mu = 7.09, sigma = 4.5)) : 
  the function mle failed to estimate the parameters, 
                with the error code 100
In addition: Warning messages:
1: In fitdist(data, "ZIP", start = list(mu = 7.09, sigma = 4.5)) :
  The dZIP function should return a zero-length vector when input has length zero and not raise an error
2: In fitdist(data, "ZIP", start = list(mu = 7.09, sigma = 4.5)) :
  The pZIP function should return a zero-length vector when input has length zero and not raise an error

How can I plot a ZIP of my values to demonstrate is not the best fit?


Solution

  • The following arguments on the ZIP fit worked for me:

    • A start sigma < 1.
    • The Nelder-Mead optimizer
    • A (lower, upper) bounds for the optimization parameters mu and sigma set respectively to (0, Inf) and (0, 1),

    The result of running the following code on your data array is below, which confirms that the Zero-Inflated Negative Binomial is the best fit (based on AIC and BIC).

    library(fitdistrplus)
    library(gamlss)
    
    nb<-fitdist(data, "nbinom")
    pois<-fitdist(data, "pois")
    zinb<-fitdist(data, 'ZANBI',start = list(mu = 4, sigma = 0.2))
    
    zip<-fitdist(data, 'ZIP', start = list(mu = 7.09, sigma = 0.5), discrete=TRUE,
    optim.method="Nelder-Mead", lower = c(0, 0), upper = c(Inf, 1))
    
    print(nb)
    print(pois)
    print(zinb)
    print(zip)
    
    cdfcomp(list(nb, zinb, pois, zip))
    gofstat(list(nb, zinb, pois, zip))
    

    The only thing that worries me is that the standard error of the estimated parameters for the ZIP fit are NA...

    Partial OUTPUT

    Fitting of the distribution ' nbinom ' by maximum likelihood 
    Parameters:
         estimate Std. Error
    size 1.007110 0.05297338
    mu   5.548579 0.16643396
    
    Fitting of the distribution ' pois ' by maximum likelihood 
    Parameters:
           estimate Std. Error
    lambda 5.548313 0.06522914
    
    Fitting of the distribution ' ZANBI ' by maximum likelihood 
    Parameters:
           estimate Std. Error
    mu    6.8886199  0.1549058
    sigma 0.3401722  0.0266448
    
    Fitting of the distribution ' ZIP ' by maximum likelihood 
    Parameters:
           estimate Std. Error
    mu    7.0869552         NA
    sigma 0.2171502         NA
    
    Goodness-of-fit criteria
                                   1-mle-nbinom 2-mle-ZANBI 3-mle-pois 4-mle-ZIP
    Akaike's Information Criterion     7302.831    7141.004   10169.16  7981.985
    Bayesian Information Criterion     7313.177    7151.350   10174.33  7992.331
    

    enter image description here