Search code examples
rdplyrnumericacross

Using summarise and across in the dplyr package while distinguishing between numeric and non-numeric columns


I would like to perform some operations using dplyr on a dataset that looks like:

data <- data.frame(day = c(rep(1, 15), rep(2, 15)), nweek = rep(rep(1:5, 3),2), 
                   firm = rep(sapply(letters[1:3], function(x) rep(x, 5)), 2), 
                   quant = rnorm(30), price = runif(30) )

where each observation is at the day, week and firm level (there're only 2 days in a week).

I would like to summarise the data (grouping by firm) by (1) taking average across the days of the week across variables that are numeric (i.e., quant and price), and to take the first entry for variables that are not numeric (in this case it is only firm, but in my real dataset I have multiple variables that are not numeric (Date and character) and they may change within a week (nweek), so I would like to take only the entry in the first day of the week for all the non-numeric variables.

I tried using summarise and across but get an error

> data %>% group_by(firm, nweek) %>% dplyr::summarise(across(which(sapply(data, is.numeric)), ~ mean(.x, na.rm = TRUE)),
+                           across(which(sapply(data, !(is.numeric))), ~ head(.x, 1))
+ )
Error: Problem with `summarise()` input `..2`.
x invalid argument type
ℹ Input `..2` is `across(which(sapply(data, !(is.numeric))), ~head(.x, 1))`.
Run `rlang::last_error()` to see where the error occurred.

Any help?


Solution

  • I don't know what your expected output should look like, but something like this could reach what you are trying to achieve

    data %>%
      group_by(firm, nweek) %>% 
      summarise(
        across(where(is.numeric), ~ mean(.x, na.rm = TRUE)),
        across(!where(is.numeric), ~ head(.x, 1))
    )
    

    As a sidenote, instead of using which(sapply(...)), have a look at the where helper for conditional selection of variables inside across in this post.

    Output

    # A tibble: 15 x 5
    # Groups:   firm [3]
       firm  nweek   day   quant price
       <chr> <int> <dbl>   <dbl> <dbl>
     1 a         1   1.5 -0.336  0.903
     2 a         2   1.5  0.0837 0.579
     3 a         3   1.5  0.0541 0.425
     4 a         4   1.5  1.21   0.555
     5 a         5   1.5  0.462  0.806
     6 b         1   1.5  0.0493 0.346
     7 b         2   1.5  0.635  0.596
     8 b         3   1.5  0.406  0.583
     9 b         4   1.5 -0.707  0.205
    10 b         5   1.5  0.157  0.816
    11 c         1   1.5  0.728  0.271
    12 c         2   1.5  0.117  0.775
    13 c         3   1.5 -1.05   0.234
    14 c         4   1.5 -1.35   0.290
    15 c         5   1.5  0.771  0.310