I was doing my assignment, and I found something strange. I did this code for question #1.
x <- heights$height[heights$sex=="Male"]
and the next question is like this:
"We will define a function "CDF" like following:
CDF <- function(a) {mean(x<=a)}
Explain why the CDF function is Cumulative Distribution Function."
I get the idea of the cumulative distribution function, but I don't get why function mean() is used there.
For example, CDF(70)
equals 0.623
, which is the probability of cumulative distribution at 70
. How does the mean()
function show probability in this function?
When you do a logical test, like x <= a
, the result will be a boolean vector of TRUE
and FALSE
values. When you do math on boolean TRUE/FALSE values, TRUE
s are treated as 1 and FALSE
s are treated as 0. A common way to count the number of values of x
that are less than or equal to a
is sum(x <= a)
. Similarly, if you wanted to know what proportion of x
values are less than or equal to a
, you could do sum(x <= a) / length(x)
, which is the same as mean(x <= a)
.