In my dataframe, I have multiple columns with student grades. I want to sum the "Quiz" columns (e.g., Quiz1, Quiz2). However, I only want to sum the top 2 values, and ignore the others. I want to create a new column with the total (i.e., the sum of the top 2 values).
One issue is that some students have grades that tie for the top 2 grades in a given row. For example, Aaron has a high score of 42, but then there are two scores that tie for the second highest (i.e., 36).
Data
df <-
structure(
list(
Student = c("Aaron", "James", "Charlotte", "Katie", "Olivia",
"Timothy", "Grant", "Chloe", "Judy", "Justin"),
ID = c(30016, 87311, 61755, 55323, 94839, 38209, 34096,
98432, 19487, 94029),
Quiz1 = c(31, 25, 41, 10, 35, 19, 27, 42, 15, 20),
Quiz2 = c(42, 33, 34, 22, 23, 38, 48, 49, 23, 30),
Quiz3 = c(36, 36, 34, 32, 43, 38, 44, 42, 42, 37),
Quiz4 = c(36, 43, 39, 46, 40, 38, 43, 35, 41, 41)
),
row.names = c(NA, -10L),
class = c("tbl_df", "tbl", "data.frame")
)
I know that I can use pivot_longer
to do this, which allows me to arrange by group, then take the top 2 values for each student. This works fine, but I would like a more efficient way with tidyverse
, rather than having to pivot back and forth.
What I Tried
library(tidyverse)
df %>%
pivot_longer(-c(Student, ID)) %>%
group_by(Student, ID) %>%
arrange(desc(value), .by_group = TRUE) %>%
slice_head(n = 2) %>%
pivot_wider(names_from = name, values_from = value) %>%
ungroup() %>%
mutate(Total = rowSums(select(., starts_with("Quiz")), na.rm = TRUE))
I also know that if I wanted to sum all the columns on each row, then I could use rowSums
, as I made use of above. However, I am unsure how to do rowSums
of just the top 2 values in the 4 quiz columns.
Expected Output
# A tibble: 10 × 7
Student ID Quiz2 Quiz3 Quiz1 Quiz4 Total
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Aaron 30016 42 36 NA NA 78
2 Charlotte 61755 NA NA 41 39 80
3 Chloe 98432 49 NA 42 NA 91
4 Grant 34096 48 44 NA NA 92
5 James 87311 NA 36 NA 43 79
6 Judy 19487 NA 42 NA 41 83
7 Justin 94029 NA 37 NA 41 78
8 Katie 55323 NA 32 NA 46 78
9 Olivia 94839 NA 43 NA 40 83
10 Timothy 38209 38 38 NA NA 76
You do not have to do pivot_wider
. Note that the longer format is the tidy format. Just do pivot_longer
and left_join
:
df %>%
left_join(pivot_longer(., -c(Student, ID)) %>%
group_by(Student, ID) %>%
summarise(Total = sum(sort(value, TRUE)[1:2]), .groups = 'drop'))
# A tibble: 10 x 7
Student ID Quiz1 Quiz2 Quiz3 Quiz4 Total
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Aaron 30016 31 42 36 36 78
2 James 87311 25 33 36 43 79
3 Charlotte 61755 41 34 34 39 80
4 Katie 55323 10 22 32 46 78
5 Olivia 94839 35 23 43 40 83
6 Timothy 38209 19 38 38 38 76
7 Grant 34096 27 48 44 43 92
8 Chloe 98432 42 49 42 35 91
9 Judy 19487 15 23 42 41 83
10 Justin 94029 20 30 37 41 78