I've been using functions do
and tidy
to perform multiple t-tests on a data frame that has been group by categories ahead of time. However some of the values in the data frame are constant and I usually have to filter these out for the test to work since it returns Error in t.test.default(.$y) : data are essentially constant
. I’m looking for a way to perfor the t-test using do and tidy as normal but instead of filtering our the categories that are constant have estimate column use the value of that category and the others be NA.
Example Dataframe:
trial<-data.frame(
type=rep(c("A","B","C"),times=2,length.out=10),
y=c(1,2,3,1,3,2,1,2,3,1)
)
Data frame:
type y
1 A 1
2 B 2
3 C 3
4 A 1
5 B 3
6 C 2
7 A 1
8 B 2
9 C 3
10 A 1
Filtered T-test:
trial.ttest<-trial %>%
group_by(type) %>%
filter(!type=="A") %>%
do(tidy(t.test(.$y))))
Filtered T-test Results:
type estimate statistic p.value parameter conf.low
1 B 2.333333 7 0.01980394 2 0.8991158
2 C 2.666667 8 0.01526807 2 1.2324491
conf.high method alternative
1 3.767551 One Sample t-test two.sided
2 4.100884 One Sample t-test two.sided
I’ve tried using the following code which uses trycatch
and tribble
to do this but I end up with the Error: C stack usage 15923504 is too close to the limit
as a error.
trial.ttest<-trial %>%
group_by(type) %>%
do(tidy(tryCatch(t.test(.$y),error=function(e){
tribble(
~estimate,
.$y,NA,NA,NA,NA,NA,NA,NA
)
})))
If I forgo using tribble
and just have tryCatch
return the value it adds it in a new column named x
.
Code:
trial.ttest<-trial %>%
group_by(type) %>%
do(tidy(tryCatch(t.test(.$y),error=function(e){
.$y
})))
Results:
type x estimate statistic p.value parameter conf.low
1 A 1 NA NA NA NA NA
2 A 1 NA NA NA NA NA
3 A 1 NA NA NA NA NA
4 A 1 NA NA NA NA NA
5 B NA 2.333333 7 0.01980394 2 0.8991158
6 C NA 2.666667 8 0.01526807 2 1.2324491
conf.high method alternative
1 NA <NA> <NA>
2 NA <NA> <NA>
3 NA <NA> <NA>
4 NA <NA> <NA>
5 3.767551 One Sample t-test two.sided
6 4.100884 One Sample t-test two.sided
Is there a way to have the constant values go into the estimate column, with the method column resulting in Constant Value
and all others being NA
instead of a new column as in the last bit of code?
EDIT 1:
I forgot to add the desired resulting data frame.
Desired Result:
type estimate statistic p.value parameter conf.low conf.high method alternative
1 A 1.000000 NA NA NA NA NA Constant Value <NA>
2 B 2.333333 7 0.01980394 2 0.8991158 3.767551 One Sample t-test two.sided
3 C 2.666667 8 0.01526807 2 1.2324491 4.100884 One Sample t-test two.sided
Edit 2: Solution Attempt A
Code:
library(dplyr)
library(purrr)
library(broom)
trial %>%
split(.$type) %>%
map_if(.p = ~length(unique(.$y))>1,
.f = ~tidy(t.test(.$y)),
.else = ~tibble(estimate=.$y[1], method="Constant Value")) %>%
bind_rows(.id = 'type')
Result:
type type y estimate statistic p.value parameter conf.low
1 A A 1 NA NA NA NA NA
2 A A 1 NA NA NA NA NA
3 A A 1 NA NA NA NA NA
4 A A 1 NA NA NA NA NA
5 B <NA> NA 2.333333 7 0.01980394 2 0.8991158
6 C <NA> NA 2.666667 8 0.01526807 2 1.2324491
conf.high method alternative
1 NA <NA> <NA>
2 NA <NA> <NA>
3 NA <NA> <NA>
4 NA <NA> <NA>
5 3.767551 One Sample t-test two.sided
6 4.100884 One Sample t-test two.sided
Here is one option using purrr::map_if
, we apply t.test only if length(unique(y))>1
or n_distinct(y)>1
library(dplyr)
library(purrr)
library(broom)
trial %>%
split(.$type) %>%
map_if(.p = ~length(unique(.$y))>1,
.f = ~tidy(t.test(.$y)),
.else = ~tibble(estimate=.$y[1], method="Constant Value")) %>%
bind_rows(.id = 'type')
# A tibble: 3 x 9
type estimate method statistic p.value parameter conf.low conf.high alternative
<chr> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 A 1 Constant Value NA NA NA NA NA NA
2 B 2.33 One Sample t-test 7. 0.0198 2 0.899 3.77 two.sided
3 C 2.67 One Sample t-test 8 0.0153 2 1.23 4.10 two.sided
PS: Using purrr
>= 0.3.2