Search code examples
rtry-catchtidyversetidyrt-test

Handling Errors for Multiple T-tests when using Tidy and Do


I've been using functions do and tidy to perform multiple t-tests on a data frame that has been group by categories ahead of time. However some of the values in the data frame are constant and I usually have to filter these out for the test to work since it returns Error in t.test.default(.$y) : data are essentially constant. I’m looking for a way to perfor the t-test using do and tidy as normal but instead of filtering our the categories that are constant have estimate column use the value of that category and the others be NA.

Example Dataframe:

trial<-data.frame(
  type=rep(c("A","B","C"),times=2,length.out=10),
  y=c(1,2,3,1,3,2,1,2,3,1)
)

Data frame:

   type y
1     A 1
2     B 2
3     C 3
4     A 1
5     B 3
6     C 2
7     A 1
8     B 2
9     C 3
10    A 1

Filtered T-test:

trial.ttest<-trial %>% 
  group_by(type) %>% 
  filter(!type=="A") %>% 
  do(tidy(t.test(.$y))))

Filtered T-test Results:

  type estimate statistic    p.value parameter  conf.low
1    B 2.333333         7 0.01980394         2 0.8991158
2    C 2.666667         8 0.01526807         2 1.2324491
  conf.high            method alternative
1  3.767551 One Sample t-test   two.sided
2  4.100884 One Sample t-test   two.sided

I’ve tried using the following code which uses trycatch and tribble to do this but I end up with the Error: C stack usage 15923504 is too close to the limit as a error.

trial.ttest<-trial %>% 
  group_by(type) %>% 
  do(tidy(tryCatch(t.test(.$y),error=function(e){
    tribble(
      ~estimate,
      .$y,NA,NA,NA,NA,NA,NA,NA
    )
  })))

If I forgo using tribble and just have tryCatch return the value it adds it in a new column named x.

Code:

trial.ttest<-trial %>% 
  group_by(type) %>% 
  do(tidy(tryCatch(t.test(.$y),error=function(e){
    .$y
  })))

Results:

  type  x estimate statistic    p.value parameter  conf.low
1    A  1       NA        NA         NA        NA        NA
2    A  1       NA        NA         NA        NA        NA
3    A  1       NA        NA         NA        NA        NA
4    A  1       NA        NA         NA        NA        NA
5    B NA 2.333333         7 0.01980394         2 0.8991158
6    C NA 2.666667         8 0.01526807         2 1.2324491
  conf.high            method alternative
1        NA              <NA>        <NA>
2        NA              <NA>        <NA>
3        NA              <NA>        <NA>
4        NA              <NA>        <NA>
5  3.767551 One Sample t-test   two.sided
6  4.100884 One Sample t-test   two.sided

Is there a way to have the constant values go into the estimate column, with the method column resulting in Constant Value and all others being NA instead of a new column as in the last bit of code?

EDIT 1:

I forgot to add the desired resulting data frame.

Desired Result:

  type estimate statistic    p.value parameter  conf.low conf.high            method alternative
1    A 1.000000        NA         NA        NA        NA        NA    Constant Value        <NA>
2    B 2.333333         7 0.01980394         2 0.8991158  3.767551 One Sample t-test   two.sided
3    C 2.666667         8 0.01526807         2 1.2324491  4.100884 One Sample t-test   two.sided

Edit 2: Solution Attempt A

Code:

library(dplyr)
library(purrr)
library(broom)
trial %>% 
  split(.$type) %>% 
  map_if(.p = ~length(unique(.$y))>1, 
         .f = ~tidy(t.test(.$y)), 
         .else = ~tibble(estimate=.$y[1], method="Constant Value")) %>% 
  bind_rows(.id = 'type')

Result:

  type type  y estimate statistic    p.value parameter  conf.low
1    A    A  1       NA        NA         NA        NA        NA
2    A    A  1       NA        NA         NA        NA        NA
3    A    A  1       NA        NA         NA        NA        NA
4    A    A  1       NA        NA         NA        NA        NA
5    B <NA> NA 2.333333         7 0.01980394         2 0.8991158
6    C <NA> NA 2.666667         8 0.01526807         2 1.2324491
  conf.high            method alternative
1        NA              <NA>        <NA>
2        NA              <NA>        <NA>
3        NA              <NA>        <NA>
4        NA              <NA>        <NA>
5  3.767551 One Sample t-test   two.sided
6  4.100884 One Sample t-test   two.sided

Solution

  • Here is one option using purrr::map_if, we apply t.test only if length(unique(y))>1 or n_distinct(y)>1

    library(dplyr)
    library(purrr)
    library(broom)
    trial %>% 
      split(.$type) %>% 
      map_if(.p = ~length(unique(.$y))>1, 
             .f = ~tidy(t.test(.$y)), 
             .else = ~tibble(estimate=.$y[1], method="Constant Value")) %>% 
      bind_rows(.id = 'type')
    
    # A tibble: 3 x 9
      type  estimate method            statistic p.value parameter conf.low conf.high alternative
      <chr>    <dbl> <chr>                 <dbl>   <dbl>     <dbl>    <dbl>     <dbl> <chr>      
    1 A         1    Constant Value          NA  NA             NA   NA         NA    NA         
    2 B         2.33 One Sample t-test        7.  0.0198         2    0.899      3.77 two.sided  
    3 C         2.67 One Sample t-test        8   0.0153         2    1.23       4.10 two.sided
    

    PS: Using purrr >= 0.3.2