Search code examples
rtidyversesummarize

R: simultaneously summarize several variables by changing the aggregation function according to the type entered in a metadata table


I've got a df with several variables, and and I want to make simultaneously summarized functions but differentiated according to the type of the variables.

The difficulty is that I want to use the variable type information from another metadata df and not with the usual tests (like "is.numeric" etc.).

Here, below is a reprex, I guess I should use a "match" inside the "where", and I don't even know if we can put two different across in the same summarise, can we?

Any idea on how to write two proper tests that work?

Thanks


# a df
df <- data.frame(ID = letters[1:15],
                 Group = sample(1:3, 15, replace = TRUE),
                 Var1 = sample.int(15),
                 Var2 = sample.int(15),
                 Var3 = sample.int(15),
                 Var4 = sample.int(15))

# another df with meta data on variables = type 

metaVar <- data.frame(Var = c("Var1", "Var2", "Var3", "Var4"),
                     Type = c(rep("stock", 2), rep("ratio", 2))) 

## summarise across different variables 
# using sum for "stock" type
# and mean for "ratio" type

groupDF <- df %>% 
  group_by(Group) %>%
  summarise(across(where(names(.) %in% metaVar[metaVar$Type == "stock", ]$Var), # not working
                   sum, na.rm = TRUE),
            across(where(names(.) %in% metaVar[metaVar$Type == "ratio", ]$Var), # not working
                   mean, na.rm = TRUE)) %>% # 
  ungroup

# Problem while evaluating `where(names(.) %in% metaVar[metaVar$Type == "stock", ]$Var)`


Solution

  • You are complicating, there is no need for where nor for names(.) %in%.

    suppressPackageStartupMessages({
      library(dplyr)
    })
    
    ## summarise across different variables 
    # using sum for "stock" type
    # and mean for "ratio" type
    
    groupDF <- df %>% 
      group_by(Group) %>%
      summarise(across(metaVar$Var[metaVar$Type == "stock"], \(x) sum(x, na.rm = TRUE)),
                across(metaVar$Var[metaVar$Type == "ratio"], \(x) mean(x, na.rm = TRUE))) %>% # 
      ungroup()
    
    groupDF
    #> # A tibble: 3 × 5
    #>   Group  Var1  Var2  Var3  Var4
    #>   <int> <int> <int> <dbl> <dbl>
    #> 1     1    23    13  6.67  6   
    #> 2     2    47    69  8.5   9.67
    #> 3     3    50    38  8.17  7.33
    

    Created on 2023-03-22 with reprex v2.0.2


    Note

    I have used anonymous functions since

    #> Warning: There was 1 warning in `summarise()`.
    #> ℹ In argument: `across(metaVar$Var[metaVar$Type == "stock"], sum, na.rm =
    #>   TRUE)`.
    #> ℹ In group 1: `Group = 1`.
    #> Caused by warning:
    #> ! The `...` argument of `across()` is deprecated as of dplyr 1.1.0.
    #> Supply arguments directly to `.fns` through an anonymous function instead.
    #> 
    #>   # Previously
    #>   across(a:b, mean, na.rm = TRUE)
    #> 
    #>   # Now
    #>   across(a:b, \(x) mean(x, na.rm = TRUE))