r function functional-programming tidyeval tukey

Getting tidy output from post hoc tests

Consider this data frame dat1:

dat1 <- data.frame(Region = rep(c("r1","r2"), each = 100),
                   State = rep(c("NY","MA","FL","GA"), each = 10),
                   Loc = rep(c("a","b","c","d","e","f","g","h"),each = 5),
                   ID = rep(c(1:10), each=2),
                   var1 = rnorm(200),
                   var2 = rnorm(200),
                   var3 = rnorm(200),
                   var4 = rnorm(200),
                   var5 = rnorm(200))

I have data frames that are similar to dat1 created above. Region,State,and Loc are grouping variables for each observation ID, and 5 measurements are taken upon each observation var1:var5. For each grouping variable, I am conducting univariate anovas on each var. When significant differences are found, I am using the TukeyHSD() function and the multcompLetters() function from the multcompView package to generate CLD's on the groups. Since I want to do this for each grouping variable I am trying to write a function to keep myself from being repetitive and making typos. Below shows where I am with this:

library(tidyverse)
library(multcomp)
library(multcompView)

Tuk <- function(dat,groupvar,var){
  TUK <- TukeyHSD(aov(lm(get(var) ~ get(groupvar), data=dat)))
  names(TUK)[[1]] <- paste0(groupvar)
  lets<-multcompLetters(extract_p(TUK$groupvar))
  lets
}
#assuming all 5 vars were significant in the anovas, I would then run this for each grouping variable as follows:
vars <- paste0(names(dat1[,5:9]))
#by Region
lapply(vars, FUN=Tuk, dat=dat1, groupvar="Region")
#by State
lapply(vars, FUN=Tuk, dat=dat1, groupvar="State")
#by Loc
lapply(vars, FUN=Tuk, dat=dat1, groupvar="Loc")

The code works outside of the function. The function will create the model, but I cant figure out how to format it so that it recognizes what groupvar is for the multcompLetters(extract_p()) part? How can I fix this, and how can I get it the function to output a tidy table that shows each group and the letters for each variable I give it at once. For example it would look something like this for State using all 5 variables

     NY   MA   FL   GA
var1  a    ab   c    a
var2  a    ab   b    c
var3  a    c    ab   bc  
var4  ab   c    ab   ab 
var5  a    b    c     b

Also, is there a reasonable way to make this function produce boxplots (for each variable) of the groups that show the CLD letters?

Solution

Assuming the plot is truly what you're looking for does this get you pretty close for a sing plot of var1 ~ State, Indrajeet has done a great job of building this package and I hate reinventing wheels.

dat1 <- data.frame(Region = rep(c("r1","r2"), each = 100),
                   State = rep(c("NY","MA","FL","GA"), each = 10),
                   Loc = rep(c("a","b","c","d","e","f","g","h"),each = 5),
                   ID = rep(c(1:10), each=2),
                   var1 = rnorm(200),
                   var2 = rnorm(200),
                   var3 = rnorm(200),
                   var4 = rnorm(200),
                   var5 = rnorm(200))

library(ggstatsplot)

ggbetweenstats(dat1, State, var1, 
               plot.type = "box", 
               pairwise.comparisons = TRUE, 
               pairwise.display = "everything")

#> Note: Shapiro-Wilk Normality Test for var1: p-value = 0.183
#> 
#> Note: Bartlett's test for homogeneity of variances for factor State: p-value = 0.373
#>