Search code examples
rggplot2statisticsdiagram

How calculate probabillity of density plot?


I have following question: Is it possible to calculate a probabillity of a density plot?

So for example, I have following data frame

 test<- data.frame(
  Gruppe = rep(c("Aktien","Aktien"), 
               times=c(136, 37)),
  Zufriedenheit = c(f_keineErf, f_Erf))

and i plot a density plot, with de ggplot function:

 ggplot(test, aes(x=Zufriedenheit)) +geom_density()

How can I calculate the probability for example getting a value above 70?

Thank you!


Solution

  • Your data is not included in the question, so let's make up a small random sample:

    library(ggplot2)
    
    set.seed(69)
    
    df <- data.frame(x = rnorm(10))
    

    Now we can create a density plot as per your example:

    p <- ggplot(df, aes(x)) + 
      geom_density() +
      xlim(c(-5, 5))
    
    p
    

    Now, we can actually find the x and y coordinates of this line using the base R function density and extracting its x and y components into a data frame:

    dens <- density(df$x)
    d    <- data.frame(x = dens$x, y = dens$y)
    
    head(d)
    #>           x            y
    #> 1 -3.157056 0.0009453767
    #> 2 -3.144949 0.0010145927
    #> 3 -3.132841 0.0010870523
    #> 4 -3.120733 0.0011665920
    #> 5 -3.108625 0.0012488375
    #> 6 -3.096517 0.0013382316
    

    We can see plotting this as a red dashed geom_line it is the same as geom_density:

    p + geom_line(data = d, aes(x, y), col = "red", linetype = 2, size = 2) 
    

    Now suppose we want to know the probability of having a value of more than one. We can show the area we are interested in like this:

    p + geom_area(data = d[d$x >= 1,], aes(x, y), fill = "red")
    

    Since the x values are all equally spaced in our data frame d, then the red area's proportion of the area under the line is a simple ratio of the sum of all y values at x values greater than one to the grand sum of y:

    sum(d$y[d$x > 1])/sum(d$y)
    #> [1] 0.1599931
    

    So the probability of getting an x value of > 1 is 0.15999, or 16%

    Created on 2020-08-17 by the reprex package (v0.3.0)