Search code examples
rfunctionfunctional-programmingtidyevaltukey

Getting tidy output from post hoc tests


Consider this data frame dat1:

dat1 <- data.frame(Region = rep(c("r1","r2"), each = 100),
                   State = rep(c("NY","MA","FL","GA"), each = 10),
                   Loc = rep(c("a","b","c","d","e","f","g","h"),each = 5),
                   ID = rep(c(1:10), each=2),
                   var1 = rnorm(200),
                   var2 = rnorm(200),
                   var3 = rnorm(200),
                   var4 = rnorm(200),
                   var5 = rnorm(200))

I have data frames that are similar to dat1 created above. Region,State,and Loc are grouping variables for each observation ID, and 5 measurements are taken upon each observation var1:var5. For each grouping variable, I am conducting univariate anovas on each var. When significant differences are found, I am using the TukeyHSD() function and the multcompLetters() function from the multcompView package to generate CLD's on the groups. Since I want to do this for each grouping variable I am trying to write a function to keep myself from being repetitive and making typos. Below shows where I am with this:

library(tidyverse)
library(multcomp)
library(multcompView)

Tuk <- function(dat,groupvar,var){
  TUK <- TukeyHSD(aov(lm(get(var) ~ get(groupvar), data=dat)))
  names(TUK)[[1]] <- paste0(groupvar)
  lets<-multcompLetters(extract_p(TUK$groupvar))
  lets
}
#assuming all 5 vars were significant in the anovas, I would then run this for each grouping variable as follows:
vars <- paste0(names(dat1[,5:9]))
#by Region
lapply(vars, FUN=Tuk, dat=dat1, groupvar="Region")
#by State
lapply(vars, FUN=Tuk, dat=dat1, groupvar="State")
#by Loc
lapply(vars, FUN=Tuk, dat=dat1, groupvar="Loc")

The code works outside of the function. The function will create the model, but I cant figure out how to format it so that it recognizes what groupvar is for the multcompLetters(extract_p()) part? How can I fix this, and how can I get it the function to output a tidy table that shows each group and the letters for each variable I give it at once. For example it would look something like this for State using all 5 variables

     NY   MA   FL   GA
var1  a    ab   c    a
var2  a    ab   b    c
var3  a    c    ab   bc  
var4  ab   c    ab   ab 
var5  a    b    c     b

Also, is there a reasonable way to make this function produce boxplots (for each variable) of the groups that show the CLD letters?


Solution

  • Assuming the plot is truly what you're looking for does this get you pretty close for a sing plot of var1 ~ State, Indrajeet has done a great job of building this package and I hate reinventing wheels.

    dat1 <- data.frame(Region = rep(c("r1","r2"), each = 100),
                       State = rep(c("NY","MA","FL","GA"), each = 10),
                       Loc = rep(c("a","b","c","d","e","f","g","h"),each = 5),
                       ID = rep(c(1:10), each=2),
                       var1 = rnorm(200),
                       var2 = rnorm(200),
                       var3 = rnorm(200),
                       var4 = rnorm(200),
                       var5 = rnorm(200))
    
    library(ggstatsplot)
    
    ggbetweenstats(dat1, State, var1, 
                   plot.type = "box", 
                   pairwise.comparisons = TRUE, 
                   pairwise.display = "everything")
    
    #> Note: Shapiro-Wilk Normality Test for var1: p-value = 0.183
    #> 
    #> Note: Bartlett's test for homogeneity of variances for factor State: p-value = 0.373
    #>