Search code examples
dplyrmutate

Remove lowest values in a Row


I am a teacher and have a large student dataset with 5 quizzes and two other assignments. While each quiz max score is 30, the other two assignments have a max score of 100. While calculating the quiz scores, I want the final quiz total to exclude the lowest quiz score for each student (i.e., only the top four quiz scores should count).

Using an earlier answer on this forum, I have tried to use the following to achieve this:

df %\>% pmap_dfr(\~c(...) %\>% replace(rank(desc(.)) \> 4, 0)). This applies to the whole data and removes the lowest 3 quiz values (as the other assignments values are invariably more than 50 since out of 100)

I have tried Ipmap.dfr (\~c("q1","q2","q3","q4","q5") %\>% replace(rank(desc(.)) \> 4, 0)), which throws an error.

How do I restrict the function to only selected columns (i.e., quiz columns)?


Solution

  • You likely will need to lengthen your data based on student name (or another identifer like an ID number or email). Based on this RStudio Community post.

    library(tidyverse)
    set.seed(123)
    df <- 
      structure(
        list(
          Student = c("Alice", "Betty", "Frank", "Katie", "Louise"),
          Other1 = sample(1:100, 5, replace = TRUE),
          Other2 = sample(1:100, 5, replace = TRUE),
          Quiz1 = sample(1:30, 5, replace = TRUE),
          Quiz2 = sample(1:30, 5, replace = TRUE),
          Quiz3 = sample(1:30, 5, replace = TRUE),
          Quiz4 = sample(1:30, 5, replace = TRUE),
          Quiz5 = sample(1:30, 5, replace = TRUE)
        ),
        row.names = c(NA, -5L),
        class = c("tbl_df", "tbl", "data.frame")
      )
    
    df2 <- df %>%
      pivot_longer(cols = -c(Student, Other1, Other2)) %>%
      group_by(Student) %>%
      arrange(value) %>%
      slice_tail(n = 4) %>%
      pivot_wider(id_cols = c(Student, Other1, Other2)) %>%
      rowwise() %>%
      mutate(QuizTotal = mean(c(Quiz1, Quiz2, Quiz3, Quiz4, Quiz5), na.rm = TRUE)) %>%
      ungroup()
    
    head(df2)
    #> # A tibble: 5 × 9
    #>   Student Other1 Other2 Quiz4 Quiz5 Quiz2 Quiz1 Quiz3 QuizTotal
    #>   <chr>    <int>  <int> <int> <int> <int> <int> <int>     <dbl>
    #> 1 Alice       31     42     9    17    25    26    NA      19.2
    #> 2 Betty       79     50    30    11    28    27    NA      24  
    #> 3 Frank       51     43    19     7    25    NA    26      19.2
    #> 4 Katie       14     14    NA    21     9    19     7      14  
    #> 5 Louise      67     25    14    12    29    27    NA      20.5