I would like to perform some operations using dplyr
on a dataset that looks like:
data <- data.frame(day = c(rep(1, 15), rep(2, 15)), nweek = rep(rep(1:5, 3),2),
firm = rep(sapply(letters[1:3], function(x) rep(x, 5)), 2),
quant = rnorm(30), price = runif(30) )
where each observation is at the day, week and firm level (there're only 2 days in a week).
I would like to summarise the data (grouping by firm
) by (1) taking average across the days of the week across
variables that are numeric
(i.e., quant
and price
), and to take the first entry for variables that are not numeric (in this case it is only firm
, but in my real dataset I have multiple variables that are not numeric (Date
and character
) and they may change within a week (nweek
), so I would like to take only the entry in the first day of the week for all the non-numeric variables.
I tried using summarise
and across
but get an error
> data %>% group_by(firm, nweek) %>% dplyr::summarise(across(which(sapply(data, is.numeric)), ~ mean(.x, na.rm = TRUE)),
+ across(which(sapply(data, !(is.numeric))), ~ head(.x, 1))
+ )
Error: Problem with `summarise()` input `..2`.
x invalid argument type
ℹ Input `..2` is `across(which(sapply(data, !(is.numeric))), ~head(.x, 1))`.
Run `rlang::last_error()` to see where the error occurred.
Any help?
I don't know what your expected output should look like, but something like this could reach what you are trying to achieve
data %>%
group_by(firm, nweek) %>%
summarise(
across(where(is.numeric), ~ mean(.x, na.rm = TRUE)),
across(!where(is.numeric), ~ head(.x, 1))
)
As a sidenote, instead of using which(sapply(...))
, have a look at the where
helper for conditional selection of variables inside across
in this post.
Output
# A tibble: 15 x 5
# Groups: firm [3]
firm nweek day quant price
<chr> <int> <dbl> <dbl> <dbl>
1 a 1 1.5 -0.336 0.903
2 a 2 1.5 0.0837 0.579
3 a 3 1.5 0.0541 0.425
4 a 4 1.5 1.21 0.555
5 a 5 1.5 0.462 0.806
6 b 1 1.5 0.0493 0.346
7 b 2 1.5 0.635 0.596
8 b 3 1.5 0.406 0.583
9 b 4 1.5 -0.707 0.205
10 b 5 1.5 0.157 0.816
11 c 1 1.5 0.728 0.271
12 c 2 1.5 0.117 0.775
13 c 3 1.5 -1.05 0.234
14 c 4 1.5 -1.35 0.290
15 c 5 1.5 0.771 0.310