Search code examples

How to fit "Negative Binomial" Distribution on a histogram using ggplot2()?

I am working with a dataset that I believe follows a "Negative Binomial" distribution. However, when I fit the Negative Binomial distribution, it turns out to be a poor fit. To explore further, I simulated a Negative Binomial distribution, but even on the simulated data, the overlaying distribution does not provide a good fit.

Here is my simulated data:

# Generating negative binomial random numbers
n <- 1000  # Number of random numbers
size <- 5  # Number of successes
prob <- 0.3  # Probability of success

# Generating negative binomial random numbers
negative_binomial <- rnbinom(n, size, prob)
xx <- data.frame(negative_binomial)

I want to create a histogram with an overlay of the 'Negative Binomial' distribution on this data. Let's assume that I was given this data, so I had to estimate the parameters of the distribution using fitdist().

fit <- fitdistr(negative_binomial,densfun = "negative binomial")
ggplot(data = xx, aes(negative_binomial)) +
    aes(y = ..density..),
    bins = 18, color = "black", fill = "lightblue") +
  stat_function(fun = dnbinom ,
    args = list(mu = fit$estimate[2] , size = fit$estimate[1]),
    color = "red", size = 1)

Question: Despite knowing that the simulated data is Negative Binomial, why does the overlaying distribution provide such a poor fit to the data? What did I do wrong?

enter image description here


  • The main issue is that you are trying to plot a discrete distribution but feed the density function continuous values (which means it returns 0 for most of them).

    fit <- fitdist(negative_binomial, distr = "nbinom")
    yy <- data.frame(negative_binomial = 0:45)
    yy$density <- dnbinom(yy$negative_binomial, 
                         mu = fit$estimate["mu"] , size = fit$estimate["size"])
    ggplot(data = xx, aes(negative_binomial)) +
        aes(y = ..density..),
        binwidth = 1, color = "black", fill = "lightblue") +
      geom_linerange(data = yy, aes(ymin = 0, ymax = density),
                    color = "red")

    resulting plot showing a histogram of the data and the fitted distribution