Search code examples
rreshapetidyrdata-cleaningcoalesce

R coalesce down columns by identifer


I have a long dataset with student grades and courses going across many semesters. It has many NAs and many rows for each student. I want it to have one long row per student to fill in those NAs but keep the same column names.

Here's a sample:

library(tidyverse)
sample <- tibble(student = c("Corey", "Corey", "Sibley", "Sibley"),
                 fall_course_1 = c("Math", NA, "Science", NA),
                 fall_course_2 = c(NA, "English", NA, NA),
                 fall_grade_1 = c(90, NA, 98, NA),
                 fall_grade_2 = c(NA, 60, NA, NA))

And here's what I'd like it to look like:

library(tidyverse)
answer <- tibble(student = c("Corey", "Sibley"),
                 fall_course_1 = c("Math", "Science"),
                 fall_course_2 = c("English", NA),
                 fall_grade_1 = c(90, 98),
                 fall_grade_2 = c(60, NA))

Some semesters, some students take many classes and some just one. I've tried using coalesce(), but I can't figure it out. Any help would be appreciated!


Solution

  • You could get the first non-NA value in each column for each student.

    library(dplyr)
    sample %> group_by(student) %>% summarise_all(~na.omit(.)[1])
    
    # A tibble: 2 x 5
    #  student fall_course_1 fall_course_2 fall_grade_1 fall_grade_2
    #  <chr>   <chr>         <chr>                <dbl>        <dbl>
    #1 Corey   Math          English                 90           60
    #2 Sibley  Science       NA                      98           NA
    

    This approach returns NA if there are all NA values in a group.