Search code examples
rrstatix

Error when trying to run pairwise comparison in a loop in R


This error throws when I try to run this code. The anova and anova results function without issuse. Why is the column in the pairwise_t_test not the same as the dv= column in the anova.

Error in pull(): ! Can't extract columns that don't exist. ✖ Column column doesn't exist.

library(rstatix)

nnames <- names(df)[unlist(lapply(df, is.numeric))]
res.aov <- list()
aov_tab <- list()
pc <- list()
pc1 <- list()

for (column in nnames) {
  res.aov[[column]] <- anova_test(data = df, dv = column, 
                                  wid = `Subject`, within = `Timepoint`, between = `Genotype`)
  aov_tab[[column]] <- get_anova_table(res.aov[[column]])
  
  pc[[column]]<- df %>% pairwise_t_test(column ~`Timepoint`, paired=TRUE, p.adjust.method = "holm")
  pc[[column]]<- pc[[column]] %>% add_xy_position(x="Timepoint")
  
  pc1[[column]]<- df %>% group_by(Timepoint) %>% pairwise_t_test(column ~ `Genotype`)
  pc1[[column]]<- pc1[[column]] %>% add_xy_position(x= "Timepoint")
  } 

dataframe

dput(df)
structure(list(Subject = c("ASCVD002", "ASCVD002", "ASCVD002", 
"ASCVD003", "ASCVD003", "ASCVD003", "ASCVD004", "ASCVD004", "ASCVD004", 
"ASCVD005", "ASCVD005", "ASCVD005", "ASCVD006", "ASCVD006", "ASCVD006", 
"ASCVD008", "ASCVD008", "ASCVD008", "ASCVD009", "ASCVD009", "ASCVD009", 
"ASCVD010", "ASCVD010", "ASCVD010", "ASCVD011", "ASCVD011", "ASCVD011"
), Timepoint = c("0", "0.25", "0.5", "0", "0.25", "0.5", "0", 
"0.25", "0.5", "0", "0.25", "0.5", "0", "0.25", "0.5", "0", "0.25", 
"0.5", "0", "0.25", "0.5", "0", "0.25", "0.5", "0", "0.25", "0.5"
), Genotype = c("Heterozygote", "Heterozygote", "Heterozygote", 
"Heterozygote", "Heterozygote", "Heterozygote", "Heterozygote", 
"Heterozygote", "Heterozygote", "GG", "GG", "GG", "AA", "AA", 
"AA", "GG", "GG", "GG", "AA", "AA", "AA", "AA", "AA", "AA", "GG", 
"GG", "GG"), `Tregs CD127lo CD25+` = c(2702, 2175, 2651, 1672.8, 
3762, 4264, 1975, 3208, 3285, 3457, 3383, 2619.9, 11872, 16101, 
13443, 3935, 1894, 2297, 7385, 8901, 9522, 7100, 8789, 9309, 
371, 379, 514), `Monocytes % of Live by Size` = c(1.38, 2.66, 
4.74, 5.83, 3.9, 5.06, 6.36, 3.45, 2.64, 6.33, 10.7, 9.41, 3.42, 
3.46, 2.73, 2.38, 3.12, 4.44, 5.31, 3.59, 4.91, 1.53, 6.54, 4.85, 
6.87, 3.66, 5.07), `NK cells` = c(90.62, 153.6, 159.8, 88, 118, 
159, 74, 82, 64, 30, 344, 73, 29, 198, 79, 145, 258, 307, 30, 
74.4, 0, 47.3, 32, 0, 52.6, 95.3, 51.7)), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -27L))

I have ran it out of the loop and used the specific column without the error.


Solution

  • In pairwise_t_test you provide a formula that contains "column". The object column is a length 1 vector containing the name (!) of the variable you are interested in but not the values of the variable itself. The formula needs to have the actual variable name in it, not an object referring to the variable.

    You can avoid this by constructing the formula like this:

    pairwise_t_test(as.formula(paste0("`",column,"`", "~ Timepoint")), ...)
    

    And likewise in the second call to pairwise_t_test.

    By the way, you have very unhandy variable names in nnames. With more simple variable names (no spaces or special characters) you do not need the "`" in the code.