Search code examples
rgtsummary

How to add pooled standard error in tbl_summary() and eta effect size?


I am trying to include Pooled Standard Error (PSE) and Eta square to tbl_summary(). PSE is calculated by sqrt(mean(residuals^2)/n), I tried to calculate step by step by extracting the residuals from either aov() or lm(), but I got the error saying The dimension of respected variable and the added statistic do not match. Expecting statistic/dataframe to be length/ no. rows 1. Here is my code:

PSE <- function(data, variable, by,...) {
    aov(data[["variable"]] ~ as.factor(data[[by]]))$residuals
  }

Dataset_TPA_Full %>%
  select(diet,hardness_g,adhesiveness_g_sec, resilence, cohesion, springiness, gumminess, chewiness, firmness_g_force_1, density_g_l)%>% 
  tbl_summary(
    by = diet,
    statistic = all_continuous() ~ "{mean} ± {sem}",
    label = list(hardness_g = "Hardness (g)", 
                 adhesiveness_g_sec = "Adhesiveness (g/ sec)", 
                 resilence = "Resilience", 
                 cohesion = "Cohesion", 
                 springiness = "Springiness", 
                 gumminess = "Gumminess", 
                 chewiness = "Chewiness", 
                 firmness_g_force_1 = "Firmness (g)", 
                 density_g_l = "Density (g/ L)")
    ) %>% 
    add_p(
      test = all_continuous() ~ "aov",
    ) %>%
  add_stat(fns = all_continuous() ~ PSE) %>% 
  modify_header(label = "**Treatment**", p.value = "**p-value**") %>%
  bold_labels() %>%
  bold_levels()

Also when I tried to add Eta squared using this code, it return missing data argument when I put it in the add_stat() function

my_ES_test <- function(data, variable, by, ...) {
  aovmod = aov(data[[variable]] ~ data[[by]])
  lsr::etaSquared(aovmod)[1,1]
}

Can you help me with this? Thank you.


Solution

  • This should do it:

    library(gtsummary)
    library(dplyr)
    #> 
    #> Attaching package: 'dplyr'
    #> The following objects are masked from 'package:stats':
    #> 
    #>     filter, lag
    #> The following objects are masked from 'package:base':
    #> 
    #>     intersect, setdiff, setequal, union
    
    sem <- function(x){
      sqrt(var(x, na.rm=TRUE)/sum(!is.na(x)))
    }
    
    PSE <- function(data, variable, by,...) {
      e <- aov(data[[variable]] ~ as.factor(data[[by]]))$residuals
      sqrt(mean(e^2)/length(e))
    }
    
    mtcars %>%
      select(cyl, mpg, hp, disp, drat, qsec)%>% 
      tbl_summary(
        by = cyl,
        statistic = all_continuous() ~ "{mean} ± {sem}",
        label = list(mpg = "Miles per Gallon", 
                     hp = "Horsepower", 
                     disp = "Displacement", 
                     drat = "Rear Axel Ratio", 
                     qsec = "1/4 Mile Time")
      ) %>% 
      add_p(
        test = all_continuous() ~ "aov",
      ) %>%
      add_stat(fns = all_continuous() ~ PSE) %>% 
      modify_header(label = "**Treatment**", p.value = "**p-value**", add_stat_1 = "**PSE**") %>%
      bold_labels() %>%
      bold_levels()
    

    enter image description here

    Created on 2022-04-17 by the reprex package (v2.0.1)

    Note, the PSE() function had two problems. First, the data[["variable"]] should be data[[variable]] (without the quotes around variable). Second, you had the function return the residuals, not the PSE calculation you described in the question. Now, it returns the appropriate result. I also am not sure where you got the sem() function, so I just made one that calculates the standard error of the mean.


    Updated PSE function

    PSE <- function(data, variable, by,...) {
      s <- data %>% 
        group_by(!!sym(by)) %>% 
        summarise(s = var(!!sym(variable)), 
                  n = n()) %>% 
        mutate(num = s*(n-1))
      psd <- sqrt(sum(s$num)/(sum(s$n) - nrow(s)))
      psd*sqrt(sum(1/s$n))
    }