Search code examples
dplyrtidyverse

calculating standardized effect size for ecological data without loops functions


I am willing to calculate the standardized effect size for soil N among 2 soil water levels (optimal, reduced) and two diversity levels (High versus Low) with 5 replicates each. I am using the formula ES = (soil_N optimal - Soil_N reduced) / (soil_N optimal + soil_N reduced), separately for each replicate and diversity level (High and Low) that I can use to plot a graph. Would it be possible to do this calculation without using loops? I am looking forward to a dplyr/tidyverse solution to this problem. I am very much looking forward to your simple answers.

df<- data.frame(Soilwater = c("optimal", "optimal", "optimal", "optimal", "optimal", 
   "reduced", "reduced", "reduced", "reduced", "reduced", 
   "optimal", "optimal", "optimal", "optimal", "optimal", 
   "reduced", "reduced", "reduced", "reduced", "reduced"), 
  Diversity = c("High","High","High","High","High","High","High","High","High","High",   
   "Low", "Low", "Low","Low","Low","Low","Low","Low","Low","Low"),
 Soil_N = c(50,45, 49, 48, 49, 69, 68, 69, 70, 67, 79, 78, 79, 78, 77, 89, 89, 87, 88, 89), 
 Replicate = c(1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5))

For ex., for high diversity, the effect size can be calculated as (50-79)/(50+79) = -0.2248, (45-78)/(45-78) = -0.2682, (49-79)/(49+79) = -0.2343, (48-78)/(48+78) = -0.2381, (49-77)/(49+77) = -0.2222. the average of these 5 replicates is -0.2375. same should be done for low diversity level. I have used the following code but I am getting the value for only 1 replicate and not for all 5 replicates per treatment. Any help would be highly appreciated!

df%>%  
rowwise()%>%    
dplyr::mutate(Replicate = row_number())%>%    
dplyr::group_by(Soilwater, Diversity, Replicate)%>%     
dplyr::summarise(Soil_N = mean(Soil_N))%>% 
tidyr::spread(key = (Soilwater), value = Soil_N)%>%    
dplyr::mutate(ES = (Optimal-Reduced)/(Optimal+Reduced))

Solution

  • If I'm understanding your question, I think it's just a case of reordering your functions, with a couple of notes:

    • Your manual calculations seem to be subtracting low-diversity-optimal from high-diversity-optimal (50-79; so grouping by Soilwater) but your question seems to want to group by Diversity? Have proceeded with the second of these, but can be easily changed
    • I used pivot_wider instead of spread - does the same thing, but I find it easier to understand!
    • Your rowwise() %>% mutate(Replicate = row_number()) part was just assigning all rows the Replicate value of 1, so grouping them all as one observation. Don't think that's what you were aiming to do so have taken out.

    Edited to deal with missing variables

    One option for accounting for NAs is simply to use mean(ES, na.rm = TRUE) in calculating the mean - effectively removing missing rows from calculations:

    a)

    library(dplyr)
    library(tidyr)
    
    
    df <- tibble(
      Soilwater = rep(rep(c("optimal", "reduced"), each = 5), times = 2),
      Diversity = rep(c("high", "low"), each = 10),
      Soil_N = c(50, 45, 49, 48, 49, 69, 68, 69, 70, 67, 79, 78, 79, NA_integer_, 77, 89, 89, 87, 88, 89),
      Replicate = c(1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5)
    )
    
    df %>%
      pivot_wider(
        id_cols = c(Diversity, Replicate),
        names_from = Soilwater,
        values_from = Soil_N
      ) %>%
      mutate(ES = (optimal - reduced) / (optimal + reduced)) %>%
      group_by(Diversity) %>%
      summarise(mean_ES = mean(ES, na.rm = TRUE))
    
    #> # A tibble: 2 x 2
    #>   Diversity mean_ES
    #>   <chr>       <dbl>
    #> 1 high      -0.175 
    #> 2 low       -0.0615
    
    

    Another option would be to impute missing values from the mean of the combined Soilwater/Diversity group for each measurement of optimal and reduced.

    b)

    df %>%
      pivot_wider(
        id_cols = c(Diversity, Replicate),
        names_from = Soilwater,
        values_from = Soil_N
      ) %>%
      group_by(Diversity) %>%
      mutate(across(c(optimal, reduced), ~ replace_na(.x, mean(.x, na.rm = TRUE))),
        ES = (optimal - reduced) / (optimal + reduced)
      ) %>%
      summarise(mean_ES = mean(ES, na.rm = TRUE))
    
    #> # A tibble: 2 x 2
    #>   Diversity mean_ES
    #>   <chr>       <dbl>
    #> 1 high      -0.175 
    #> 2 low       -0.0609
    

    Created on 2022-03-17 by the reprex package (v2.0.1)