Search code examples
rdataframetidyverse

merging values across columns


I am trying to merge values from different columns that reflect various time points. Below is example data:

id    wave1   wave2      wave3     wave4
1       12      NA        NA        NA  
1       NA      32        NA        NA  
1       NA      NA        45        NA   
1       NA      NA        NA        84
2       67      NA        NA        NA  
2       NA      NA        NA        54

I want:

id    wave1   wave2      wave3     wave4
1       12      32        45        84  
2       67      NA        NA        54  

Does anyone know how I might do this? I tried using dplyr, but I wasn't able to get my code to work.


Solution

  • If you can guarantee that there is never more than one non-NA value, then you can do this:

    library(dplyr)
    quux %>%
      group_by(id) %>%
      summarize(across(everything(), ~ na.omit(.)[1]))
    # # A tibble: 2 × 5
    #      id wave1 wave2 wave3 wave4
    #   <int> <int> <int> <int> <int>
    # 1     1    12    32    45    84
    # 2     2    67    NA    NA    54
    

    However, if you think there may be 2 or more in an id/wave pair, then we can pivot/filter/pivot:

    quux$wave2[1] <- 31L # just to prove the point
    library(tidyr) # pivot_*
    quux %>%
      pivot_longer(cols = -id) %>%
      filter(!is.na(value)) %>%
      mutate(rn = row_number(), .by = c("id", "name")) %>%
      pivot_wider(id_cols = c(id, rn), names_from = name, values_from = value) %>%
      select(-rn)
    # # A tibble: 3 × 5
    #      id wave1 wave2 wave3 wave4
    #   <int> <int> <int> <int> <int>
    # 1     1    12    31    45    84
    # 2     1    NA    32    NA    NA
    # 3     2    67    NA    NA    54
    

    Data

    quux <- structure(list(id = c(1L, 1L, 1L, 1L, 2L, 2L), wave1 = c(12L, NA, NA, NA, 67L, NA), wave2 = c(NA, 32L, NA, NA, NA, NA), wave3 = c(NA, NA, 45L, NA, NA, NA), wave4 = c(NA, NA, NA, 84L, NA, 54L)), class = "data.frame", row.names = c(NA, -6L))