Search code examples
rggplot2probability-density

How to plot binomial PDF distributions centered on same mean


I'm trying to plot a few Binomial distributions and show that as N increases, the curve looks more and more like the normal. I've tried using dbinom, but here's what I get:

Screebshot of binominal distribution graph

Here's the code I'm using to produce this distribution:

x <- -5:250
y10 <- dbinom(x, 10, 0.5)
y30 <- dbinom(x, 30, 0.5)
y60 <- dbinom(x, 60, 0.5)
y100 <- dbinom(x, 100, 0.5)
ynorm <- dnorm(x, mean=-1, sd=1)

y10 <- y10 * sqrt(y10) / 0.8
y30 <- y30 * sqrt(y30) / 0.8
y60 <- y60 * sqrt(y60) / 0.8
y100 <- y100 * sqrt(y100) / 0.8

y10 <- y10[7:17]
y30 <- y30[17:27]
y60 <- y60[32:42]
y100 <- y100[52:62]

plot(range(0, 10), range(0, 0.5), type = "n")]
lines(ynorm, col = "red", type = "l")
lines(y10, col = "blue", type = "l")
lines(y30, col = "orange", type = "l")
lines(y60, col = "green", type = "l")
lines(y100, col = "yellow", type = "l")

Does anyone know how to correctly adjust a binomial distribution in R?

Theoretically an N of 1000 should make it look like a normal distribution, but I have no clue how to get there, and I've tried/failed to use ggplot2 :(


Solution

  • You can rescale the x values so that x==0 always occurs at the peak density for each binomial density. You can do this by finding the x value at which the density is a maximum for each of the densities. For example:

    library(ggplot2)
    theme_set(theme_classic())
    library(dplyr)
    
    x <- -5:250
    n = c(6,10,30,60,100)
    p = 0.5
    
    binom = data.frame(x=rep(x, length(n)), 
                       y=dbinom(x, rep(n, each=length(x)), p),
                       n=rep(n, each=length(x)))
    
    ggplot(binom %>% filter(y > 1e-5) %>% 
             group_by(n) %>%
             mutate(x = x - x[which.max(y)]), 
           aes(x, y, colour=factor(n))) +
      geom_line() + geom_point(size=0.6) +
      labs(colour="n")
    

    enter image description here

    In reference to your comment, here's one way to add a normal density in addition to the binomial density: The mean of a binomial distribution is n*p, where n is the number of trials and p is the probability of success. The variance is n*p*(1-p). So, for each of the binomial densities above, we want normal densities with the same mean and variance. We create a data frame of these below and then plot the binomial and normal densities together.

    First, create a new vector of x values that includes a higher density of points, to reflect the fact that the normal distribution is continuous, rather than discrete:

    x = seq(-5,250,length.out=2000)
    

    Now we create a data frame of normal densities with the same means and variances as the binomial densities above:

    normal=data.frame(x=rep(x, length(n)),
                      y=dnorm(x, rep(n,each=length(x))*p, (rep(n, each=length(x))*p*(1-p))^0.5),
                      n=rep(n, each=length(x)))
    
    # Cut off y-values below ymin
    ymin = 1e-3
    

    So now we have two data frames to plot. We still add the binom data frame in the main call to ggplot. But here we also add a call to geom_line for plotting the normal densities. And we give geom_line the normal data frame. Also, for this plot we've used geom_segment to emphasize the discrete points of the binomial density (you could also use geom_bar for this).

    ggplot(binom %>% filter(y > ymin), aes(x, y)) +
      geom_point(size=1.2, colour="blue") +
      geom_line(data=normal %>% filter(y > ymin), lwd=0.7, colour="red") +
      geom_segment(aes(x=x, xend=x, y=0, yend=y), lwd=0.8, alpha=0.5, colour="blue") +
      facet_grid(. ~ n, scales="free", space="free")
    

    Here's what the new plot looks like. You can change the scaling in various ways and there are probably many other ways to tweak it, depending on what you want to emphasize.

    enter image description here