Search code examples
rdplyrdata.tableaggregategeometric-mean

Calculate geometric mean by ID across entire long data frame in R


In R, I am trying to calculate the geometric mean (exp(mean(log(x, na.rm=T))) across all columns in a data frame by participant ID. The data frame is in long format. Below is a comparable code that I have so far... it isn't working. I have also tried data.table, but still unsuccessful. Any help appreciated

 mtcars_sub <- mtcars[,1:2]
 mtcars_sub_gm <- mtcars_sub %>% 
                         group_by(cyl) %>% 
                              summarise_all(function (x) exp(mean(log(x, na.rm=TRUE))))  

 gm_vars <- names(mtcars_sub )[1] #this is very simplistic, but in my actual program there are +80 columns
 mtcars_sub_gm <- mtcars_sub [,lapply(.SD, function(x) {exp(mean(log(x, na.rm=T)))}), by = 
                             cyl, .SDcols = gm_vars] 

Solution

  • I think the issue was related to the placement of the na.rm = TRUE, which should be a parameter of mean() but was placed within the log() parentheses.

    library(dplyr)
    mtcars[,1:5] %>% 
      group_by(cyl) %>% 
      summarize(across(everything(), ~exp(mean(log(.x), na.rm=TRUE))))
    
    # A tibble: 3 × 5
        cyl   mpg  disp    hp  drat
      <dbl> <dbl> <dbl> <dbl> <dbl>
    1     4  26.3  102.  80.1  4.06
    2     6  19.7  180. 121.   3.56
    3     8  14.9  347. 204.   3.21