Search code examples
rstatisticsbuilt-ingeometric-mean

Geometric Mean: is there a built-in?


I tried to find a built-in for geometric mean but couldn't.

(Obviously a built-in isn't going to save me any time while working in the shell, nor do I suspect there's any difference in accuracy; for scripts I try to use built-ins as often as possible, where the (cumulative) performance gain is often noticeable.

In case there isn't one (which I doubt is the case) here's mine.

gm_mean = function(a){prod(a)^(1/length(a))}

Solution

  • Here is a vectorized, zero- and NA-tolerant function for calculating geometric mean in R. The verbose mean calculation involving length(x) is necessary for the cases where x contains non-positive values.

    gm_mean = function(x, na.rm=TRUE){
      exp(sum(log(x[x > 0]), na.rm=na.rm) / length(x))
    }
    

    Thanks to @ben-bolker for noting the na.rm pass-through and @Gregor for making sure it works correctly.

    I think some of the comments are related to a false-equivalency of NA values in the data and zeros. In the application I had in mind they are the same, but of course this is not generally true. Thus, if you want to include optional propagation of zeros, and treat the length(x) differently in the case of NA removal, the following is a slightly longer alternative to the function above.

    gm_mean = function(x, na.rm=TRUE, zero.propagate = FALSE){
      if(any(x < 0, na.rm = TRUE)){
        return(NaN)
      }
      if(zero.propagate){
        if(any(x == 0, na.rm = TRUE)){
          return(0)
        }
        exp(mean(log(x), na.rm = na.rm))
      } else {
        exp(sum(log(x[x > 0]), na.rm=na.rm) / length(x))
      }
    }
    

    Note that it also checks for any negative values, and returns a more informative and appropriate NaN respecting that geometric mean is not defined for negative values (but is for zeros). Thanks to commenters who stayed on my case about this.