I am trying to merge values from different columns that reflect various time points. Below is example data:
id wave1 wave2 wave3 wave4
1 12 NA NA NA
1 NA 32 NA NA
1 NA NA 45 NA
1 NA NA NA 84
2 67 NA NA NA
2 NA NA NA 54
I want:
id wave1 wave2 wave3 wave4
1 12 32 45 84
2 67 NA NA 54
Does anyone know how I might do this? I tried using dplyr, but I wasn't able to get my code to work.
If you can guarantee that there is never more than one non-NA
value, then you can do this:
library(dplyr)
quux %>%
group_by(id) %>%
summarize(across(everything(), ~ na.omit(.)[1]))
# # A tibble: 2 × 5
# id wave1 wave2 wave3 wave4
# <int> <int> <int> <int> <int>
# 1 1 12 32 45 84
# 2 2 67 NA NA 54
However, if you think there may be 2 or more in an id/wave pair, then we can pivot/filter/pivot:
quux$wave2[1] <- 31L # just to prove the point
library(tidyr) # pivot_*
quux %>%
pivot_longer(cols = -id) %>%
filter(!is.na(value)) %>%
mutate(rn = row_number(), .by = c("id", "name")) %>%
pivot_wider(id_cols = c(id, rn), names_from = name, values_from = value) %>%
select(-rn)
# # A tibble: 3 × 5
# id wave1 wave2 wave3 wave4
# <int> <int> <int> <int> <int>
# 1 1 12 31 45 84
# 2 1 NA 32 NA NA
# 3 2 67 NA NA 54
Data
quux <- structure(list(id = c(1L, 1L, 1L, 1L, 2L, 2L), wave1 = c(12L, NA, NA, NA, 67L, NA), wave2 = c(NA, 32L, NA, NA, NA, NA), wave3 = c(NA, NA, 45L, NA, NA, NA), wave4 = c(NA, NA, NA, 84L, NA, 54L)), class = "data.frame", row.names = c(NA, -6L))