I'm trying to plot a few Binomial distributions and show that as N increases, the curve looks more and more like the normal. I've tried using dbinom, but here's what I get:
Here's the code I'm using to produce this distribution:
x <- -5:250
y10 <- dbinom(x, 10, 0.5)
y30 <- dbinom(x, 30, 0.5)
y60 <- dbinom(x, 60, 0.5)
y100 <- dbinom(x, 100, 0.5)
ynorm <- dnorm(x, mean=-1, sd=1)
y10 <- y10 * sqrt(y10) / 0.8
y30 <- y30 * sqrt(y30) / 0.8
y60 <- y60 * sqrt(y60) / 0.8
y100 <- y100 * sqrt(y100) / 0.8
y10 <- y10[7:17]
y30 <- y30[17:27]
y60 <- y60[32:42]
y100 <- y100[52:62]
plot(range(0, 10), range(0, 0.5), type = "n")]
lines(ynorm, col = "red", type = "l")
lines(y10, col = "blue", type = "l")
lines(y30, col = "orange", type = "l")
lines(y60, col = "green", type = "l")
lines(y100, col = "yellow", type = "l")
Does anyone know how to correctly adjust a binomial distribution in R?
Theoretically an N of 1000 should make it look like a normal distribution, but I have no clue how to get there, and I've tried/failed to use ggplot2 :(
You can rescale the x values so that x==0
always occurs at the peak density for each binomial density. You can do this by finding the x value at which the density is a maximum for each of the densities. For example:
library(ggplot2)
theme_set(theme_classic())
library(dplyr)
x <- -5:250
n = c(6,10,30,60,100)
p = 0.5
binom = data.frame(x=rep(x, length(n)),
y=dbinom(x, rep(n, each=length(x)), p),
n=rep(n, each=length(x)))
ggplot(binom %>% filter(y > 1e-5) %>%
group_by(n) %>%
mutate(x = x - x[which.max(y)]),
aes(x, y, colour=factor(n))) +
geom_line() + geom_point(size=0.6) +
labs(colour="n")
In reference to your comment, here's one way to add a normal density in addition to the binomial density: The mean of a binomial distribution is n*p
, where n
is the number of trials and p
is the probability of success. The variance is n*p*(1-p)
. So, for each of the binomial densities above, we want normal densities with the same mean and variance. We create a data frame of these below and then plot the binomial and normal densities together.
First, create a new vector of x values that includes a higher density of points, to reflect the fact that the normal distribution is continuous, rather than discrete:
x = seq(-5,250,length.out=2000)
Now we create a data frame of normal densities with the same means and variances as the binomial densities above:
normal=data.frame(x=rep(x, length(n)),
y=dnorm(x, rep(n,each=length(x))*p, (rep(n, each=length(x))*p*(1-p))^0.5),
n=rep(n, each=length(x)))
# Cut off y-values below ymin
ymin = 1e-3
So now we have two data frames to plot. We still add the binom
data frame in the main call to ggplot
. But here we also add a call to geom_line
for plotting the normal densities. And we give geom_line
the normal
data frame. Also, for this plot we've used geom_segment
to emphasize the discrete points of the binomial density (you could also use geom_bar
for this).
ggplot(binom %>% filter(y > ymin), aes(x, y)) +
geom_point(size=1.2, colour="blue") +
geom_line(data=normal %>% filter(y > ymin), lwd=0.7, colour="red") +
geom_segment(aes(x=x, xend=x, y=0, yend=y), lwd=0.8, alpha=0.5, colour="blue") +
facet_grid(. ~ n, scales="free", space="free")
Here's what the new plot looks like. You can change the scaling in various ways and there are probably many other ways to tweak it, depending on what you want to emphasize.