I've got a df with several variables, and and I want to make simultaneously summarized functions but differentiated according to the type of the variables.
The difficulty is that I want to use the variable type information from another metadata df and not with the usual tests (like "is.numeric" etc.).
Here, below is a reprex, I guess I should use a "match" inside the "where", and I don't even know if we can put two different across in the same summarise, can we?
Any idea on how to write two proper tests that work?
Thanks
# a df
df <- data.frame(ID = letters[1:15],
Group = sample(1:3, 15, replace = TRUE),
Var1 = sample.int(15),
Var2 = sample.int(15),
Var3 = sample.int(15),
Var4 = sample.int(15))
# another df with meta data on variables = type
metaVar <- data.frame(Var = c("Var1", "Var2", "Var3", "Var4"),
Type = c(rep("stock", 2), rep("ratio", 2)))
## summarise across different variables
# using sum for "stock" type
# and mean for "ratio" type
groupDF <- df %>%
group_by(Group) %>%
summarise(across(where(names(.) %in% metaVar[metaVar$Type == "stock", ]$Var), # not working
sum, na.rm = TRUE),
across(where(names(.) %in% metaVar[metaVar$Type == "ratio", ]$Var), # not working
mean, na.rm = TRUE)) %>% #
ungroup
# Problem while evaluating `where(names(.) %in% metaVar[metaVar$Type == "stock", ]$Var)`
You are complicating, there is no need for where
nor for names(.) %in%
.
suppressPackageStartupMessages({
library(dplyr)
})
## summarise across different variables
# using sum for "stock" type
# and mean for "ratio" type
groupDF <- df %>%
group_by(Group) %>%
summarise(across(metaVar$Var[metaVar$Type == "stock"], \(x) sum(x, na.rm = TRUE)),
across(metaVar$Var[metaVar$Type == "ratio"], \(x) mean(x, na.rm = TRUE))) %>% #
ungroup()
groupDF
#> # A tibble: 3 × 5
#> Group Var1 Var2 Var3 Var4
#> <int> <int> <int> <dbl> <dbl>
#> 1 1 23 13 6.67 6
#> 2 2 47 69 8.5 9.67
#> 3 3 50 38 8.17 7.33
Created on 2023-03-22 with reprex v2.0.2
I have used anonymous functions since
#> Warning: There was 1 warning in `summarise()`.
#> ℹ In argument: `across(metaVar$Var[metaVar$Type == "stock"], sum, na.rm =
#> TRUE)`.
#> ℹ In group 1: `Group = 1`.
#> Caused by warning:
#> ! The `...` argument of `across()` is deprecated as of dplyr 1.1.0.
#> Supply arguments directly to `.fns` through an anonymous function instead.
#>
#> # Previously
#> across(a:b, mean, na.rm = TRUE)
#>
#> # Now
#> across(a:b, \(x) mean(x, na.rm = TRUE))