Search code examples
rnanmeanna

Why does mean(NA, na.rm = TRUE) return NaN


When estimating the mean with a vector of all NA's we get an NaN if na.rm = TRUE. Why is this, is this flawed logic or is there something I'm missing? Surely it would make more sense to use NA than NaN?

Quick example below

mean(NA, na.rm = TRUE)
#[1] NaN

mean(rep(NA, 10), na.rm = TRUE)
#[1] NaN

Solution

  • It is a bit pity that ?mean does not say anything about this. My comment only told you that applying mean on an empty "numeric" results in NaN without more reasoning. Rui Barradas's comment tried to reason this but was not accurate, as division by 0 is not always NaN, it can be Inf or -Inf. I once discussed about this in R: element-wise matrix division. However, we are getting close. Although mean(x) is not coded by sum(x) / length(x), this mathematical fact really explains this NaN.

    From ?sum:

     *NB:* the sum of an empty set is zero, by definition.
    

    So sum(numeric(0)) is 0. As length(numeric(0)) is 0, mean(numeric(0)) is 0 / 0 which is NaN.