Search code examples
rggplot2areanormal-distribution

Wrong area on normal curve plot


I'm trying to learn R from scratch and I just delivered a college assignment for hypothesis testing a binomial distribution (proportion test for one sample) that I used R to solve and plot. But I ran into some problems.

My sample size is 130, success cases are 68.

  • H0: π = 50%
  • H1: π > 50

The is the code I used (plenty of copy-paste and trial/error)

library(ggplot2)
library(ggthemes)
library(scales)


#data

n = 130
p = 1/2
stdev = sqrt(n*p*(1-p))
mean_binon = n*p
cases = 68
ztest = (cases-mean_binon)/stdev
pvalor = pnorm(-abs(ztest))
zcrit = qnorm(0.975)

#normal curve
xvalues <- data.frame(x = c(-4, 4))

#first plots and lines
p1 <- ggplot(xvalues, aes(x = xvalues))
p2 <- p1 + stat_function(fun = dnorm) + xlim(c(-4, 4)) +
    geom_vline(xintercept = ztest, linetype="solid", color="blue", 
               size=1) +
    geom_vline(xintercept = zcrit, linetype="solid", color="red", 
                   size=1)


#z area function
area_z <- function(x){
    norm_z <- dnorm(x)
    norm_z[x < ztest] <- NA
    return(norm_z)
}

#critical z area function
area_zc <- function(x){
    norm_zc <- dnorm(x)
    norm_zc[x < zcrit] <- NA
    return(norm_zc)
}


#area value
valor_area_z <- round(pnorm(4) - pnorm(ztest), 3)
valor_area_zc <- round(pnorm(4) - pnorm(zcrit), 3)


#final plot

p3 <- p2 + stat_function(fun = dnorm) + 
    stat_function(fun = area_z, geom = "area", fill = "blue", alpha = 0.3) +
    geom_text(x = 1.13, y = 0.1, size = 5, fontface = "bold",
              label = paste0(valor_area_z * 100, "%")) +
    stat_function(fun = area_zc, geom = "area", fill = "red", alpha = 0.5) +
    geom_text(x = 2.27, y = 0.015, size = 3, fontface = "bold",
              label = paste0(valor_area_zc * 100, "%")) +
    scale_x_continuous(breaks = c(-3:3)) + 
    labs(x = "\n z", y = "f(z) \n", title = "Distribuição Normal \n") +
    theme_fivethirtyeight()

p3

Here's the plot

enter image description here

There is a gap between my geom_vline's and the shaded area. I'm not sure if I'm doing the wrong steps with my statistics or this is an R related problem. Maybe both? Sorry if this is elementary. I'm not good at both but I'm trying to improve.


Solution

  • A solution is to use the option xlim inside stat_function which defines the range of the function.
    You can also replace area_z and area_zc with dnorm.

    p3 <- p2 + stat_function(fun = dnorm) + 
        stat_function(fun = dnorm, geom = "area", fill = "blue", alpha = 0.3, 
                      xlim = c(ztest,zcrit)) +
        geom_text(x = 1.13, y = 0.1, size = 5, fontface = "bold",
                  label = paste0(valor_area_z * 100, "%")) +
        stat_function(fun = dnorm, geom = "area", fill = "red", alpha = 0.5, 
                      xlim = c(zcrit,xvalues$x[2])) +
        geom_text(x = 2.27, y = 0.015, size = 3, fontface = "bold",
                  label = paste0(valor_area_zc * 100, "%")) +
        scale_x_continuous(breaks = c(-3:3)) + 
        labs(x = "\n z", y = "f(z) \n", title = "Distribuição Normal \n") +
        theme_fivethirtyeight()
    
    p3
    

    enter image description here