I am working with a dataset that I believe follows a "Negative Binomial" distribution. However, when I fit the Negative Binomial distribution, it turns out to be a poor fit. To explore further, I simulated a Negative Binomial distribution, but even on the simulated data, the overlaying distribution does not provide a good fit.
Here is my simulated data:
library(ggplot2)
library(MASS)
library(fitdistrplus)
# Generating negative binomial random numbers
n <- 1000 # Number of random numbers
size <- 5 # Number of successes
prob <- 0.3 # Probability of success
# Generating negative binomial random numbers
negative_binomial <- rnbinom(n, size, prob)
xx <- data.frame(negative_binomial)
I want to create a histogram with an overlay of the 'Negative Binomial' distribution on this data. Let's assume that I was given this data, so I had to estimate the parameters of the distribution using fitdist()
.
fit <- fitdistr(negative_binomial,densfun = "negative binomial")
ggplot(data = xx, aes(negative_binomial)) +
geom_histogram(
aes(y = ..density..),
bins = 18, color = "black", fill = "lightblue") +
stat_function(fun = dnbinom ,
args = list(mu = fit$estimate[2] , size = fit$estimate[1]),
color = "red", size = 1)
Question: Despite knowing that the simulated data is Negative Binomial
, why does the overlaying distribution provide such a poor fit to the data? What did I do wrong?
The main issue is that you are trying to plot a discrete distribution but feed the density function continuous values (which means it returns 0 for most of them).
fit <- fitdist(negative_binomial, distr = "nbinom")
yy <- data.frame(negative_binomial = 0:45)
yy$density <- dnbinom(yy$negative_binomial,
mu = fit$estimate["mu"] , size = fit$estimate["size"])
ggplot(data = xx, aes(negative_binomial)) +
geom_histogram(
aes(y = ..density..),
binwidth = 1, color = "black", fill = "lightblue") +
geom_linerange(data = yy, aes(ymin = 0, ymax = density),
color = "red")