Search code examples
rstatisticsmeancdf

Can mean() function show probability of cumulative distribution function?


I was doing my assignment, and I found something strange. I did this code for question #1.

x <- heights$height[heights$sex=="Male"]

and the next question is like this:

"We will define a function "CDF" like following: CDF <- function(a) {mean(x<=a)} Explain why the CDF function is Cumulative Distribution Function."

I get the idea of the cumulative distribution function, but I don't get why function mean() is used there.

For example, CDF(70) equals 0.623, which is the probability of cumulative distribution at 70. How does the mean() function show probability in this function?


Solution

  • When you do a logical test, like x <= a, the result will be a boolean vector of TRUE and FALSE values. When you do math on boolean TRUE/FALSE values, TRUEs are treated as 1 and FALSEs are treated as 0. A common way to count the number of values of x that are less than or equal to a is sum(x <= a). Similarly, if you wanted to know what proportion of x values are less than or equal to a, you could do sum(x <= a) / length(x), which is the same as mean(x <= a).