How to plot a 95% confidence interval graph for one sample proportion

qplot(x="SignUp", y=0.07, ymin=Lower_Level, ymax=Upper_Level, ylim=c(0,1), geom = "pointrange")+coord_flip() +
  
  ylab("SignUp Proportion")+geom_hline(yintercept=Upper_Level)+geom_hline(yintercept=Lower_Level)

This is what I managed to plot. But I want something like the graph below. The confidence intervals are 0.084 and 0.0551. The sample proportion is 0.07

Solution

I guess you could show the 95% confidence interval for the estimated probability like this:

First, start with a data frame of 1s and 0s representing your "success" and "failure" rate in the sample. Here, your numbers suggest approximately 105 out of 1500 successes, so we do:

df <- data.frame(x = c(rep(1, 105), rep(0, 1395)))

Now we fit a logistic regression with the intercept being the only parameter we are estimating:

mod <- coef(summary(glm(x ~ 1, family = binomial, data = df)))

mod
#>              Estimate Std. Error  z value      Pr(>|z|)
#> (Intercept) -2.586689  0.1011959 -25.5612 4.122466e-144

The estimate here should be normally distributed (on the log odds scale) with the given estimate and standard error, so we can grab the density values over an appropriate range by doing:

xvals <- seq(mod[1] - 3 * mod[2], mod[1] + 3 * mod[2], 0.01)
yvals <- dnorm(xvals, mod[1], mod[2])

Now we convert the x values from log odds to probabilities:

pxvals <- exp(xvals)/(1 + exp(xvals))

We will also create a vector that labels whether the values are within 1.96 standard deviations of the estimate:

level <- ifelse(xvals < mod[1] - 1.96 * mod[2], "lower",
          ifelse(xvals > mod[1] + 1.96 * mod[2], "upper", "estimate"))

Now we put all of these in a data frame and plot:

plot_df <- data.frame(xvals, yvals, pxvals, level)

library(ggplot2)

ggplot(plot_df, aes(pxvals, yvals, fill = level)) +
  geom_area(alpha = 0.5) +
  geom_vline(xintercept = exp(mod[1])/(1 + exp(mod[1])), linetype = 2) + 
  scale_fill_manual(values = c("gray70", "deepskyblue4", "deepskyblue4"),
                    guide = guide_none()) +
  scale_x_continuous(limits = c(0.03, 0.13), breaks = 3:12/100,
                     name = "probability") +
  theme_bw()

Note that because we have transformed the x axis, this is no longer a genuine density plot. The y axis becomes somewhat arbitrary as a result, but the plot still shows accurately the 95% confidence interval for the probability estimate.

EDIT

Here's an alternative method if the glm approach seems too complicated. It uses the binomial distribution to get the 95% confidence intervals. You just supply it with the population size and the number of "successes"

library(ggplot2)

population <- 1500
actual_successes <- 105
test_successes <- 1:300

density <- dbinom(test_successes, population, actual_successes/population)
probs   <- pbinom(test_successes, population, actual_successes/population)
label   <- ifelse(probs < 0.025, "low", ifelse(probs > 0.975, "high", "CI"))

ggplot(data.frame(probability = test_successes/population, density, label),
       aes(probability, density, fill = label)) +
  geom_area(alpha = 0.5) +
  geom_vline(xintercept = actual_successes/population, linetype = 2) + 
  scale_fill_manual(values = c("gray70", "deepskyblue4", "deepskyblue4"),
                    guide = guide_none()) +
  scale_x_continuous(limits = c(0.03, 0.13), breaks = 3:12/100,
                     name = "probability") +
  theme_bw()