I have a long dataset with student grades and courses going across many semesters. It has many NAs and many rows for each student. I want it to have one long row per student to fill in those NAs but keep the same column names.
Here's a sample:
library(tidyverse)
sample <- tibble(student = c("Corey", "Corey", "Sibley", "Sibley"),
fall_course_1 = c("Math", NA, "Science", NA),
fall_course_2 = c(NA, "English", NA, NA),
fall_grade_1 = c(90, NA, 98, NA),
fall_grade_2 = c(NA, 60, NA, NA))
And here's what I'd like it to look like:
library(tidyverse)
answer <- tibble(student = c("Corey", "Sibley"),
fall_course_1 = c("Math", "Science"),
fall_course_2 = c("English", NA),
fall_grade_1 = c(90, 98),
fall_grade_2 = c(60, NA))
Some semesters, some students take many classes and some just one. I've tried using coalesce(), but I can't figure it out. Any help would be appreciated!
You could get the first non-NA value in each column for each student
.
library(dplyr)
sample %> group_by(student) %>% summarise_all(~na.omit(.)[1])
# A tibble: 2 x 5
# student fall_course_1 fall_course_2 fall_grade_1 fall_grade_2
# <chr> <chr> <chr> <dbl> <dbl>
#1 Corey Math English 90 60
#2 Sibley Science NA 98 NA
This approach returns NA
if there are all NA
values in a group.