I want to use regex to identify the variable to use to group_by and to summarize my data efficiently. I cannot do separately because I have a large number of variables to summarize and the variable to group_by needs to be passed dynamically each time. data.table
accepts using regex to pass the grouping variable, but not the summarizing variables. My attempts so far using tidyverse have been unsuccessful as well. Any help would be much appreciated.
My data:
tempDF <- structure(list(d1 = c("A", "B", "C", "A", "C"), d2 = c(40L, 50L, 20L, 50L, 20L),
d3 = c(20L, 40L, 50L, 40L, 50L), d4 = c(60L, 30L, 30L,60L, 30L), p_A = c(1L,
3L, 2L, 3L, 2L), p_B = c(3L, 4L, 3L, 3L, 4L), p_C = c(2L, 1L, 1L,2L, 1L), p4 = c(5L,
5L, 4L, 5L, 4L)), class = "data.frame", row.names = c(NA, -5L))
View(tempDF)
lLevels<-c("d1")
lContinuum<-c("p_A", "p_B", "p_C")
My attempts:
setDT(tempDF)[ , list(group_means = mean(eval((paste0(lContinuum)))), by=eval((paste0(lLevels))))]
group_means by
1: NA d1
Warning message:
In mean.default(eval((paste0(lContinuum)))) :
argument is not numeric or logical: returning NA
But a single variable works:
setDT(tempDF)[ , list(group_means = mean(p_A)), by=eval((paste0(lLevels)))]
setDT(tempDF)[ , list(group_means = mean(p_B)), by=eval((paste0(lLevels)))]
setDT(tempDF)[ , list(group_means = mean(p_C)), by=eval((paste0(lLevels)))]
Expected output:
tempDF %>%
group_by(d1) %>%
summarise(p_A_mean = mean(p_A), p_B_mean = mean(p_B), p_C_mean = mean(p_C))
# A tibble: 3 x 4
d1 p_A_mean p_B_mean p_C_mean
<chr> <dbl> <dbl> <dbl>
1 A 2 3 2
2 B 3 4 1
3 C 2 3.5 1
The data.table approach is very simple:
library(data.table)
setDT(tempDF)
tempDF[, lapply(.SD, mean),
by = lLevels,
.SDcols = lContinuum]
d1 p_A p_B p_C
1: A 2 3.0 2
2: B 3 4.0 1
3: C 2 3.5 1
Similar approach in dplyr would be:
library(dplyr)
tempDF%>%
group_by_at(lLevels)%>%
summarize_at(lContinuum, mean)
# A tibble: 3 x 4
d1 p_A p_B p_C
<chr> <dbl> <dbl> <dbl>
1 A 2 3 2
2 B 3 4 1
3 C 2 3.5 1
In either case, you can replace lLevels
and lContinuum
with regex. The dplyr option also would allow for select helpers such as starts_with()
and ends_with()
:
https://www.rdocumentation.org/packages/tidyselect/versions/0.2.5/topics/select_helpers .