Search code examples
rdplyr

Collapsing data per family


I have this data set, with values for twins within families:

zyg   fid    x_t1    x_t2     y_t1   y_t2
 1 499474     NA     1      1    NA
 1 499474     NA     NA    NA    NA
 1 499474     NA     NA    NA     1
 1 499474     NA     NA    NA    NA
 1 499540     NA     NA     1    NA
 1 499540     NA     NA    NA    NA
 2 499874     NA     NA    NA    NA
 2 499874     NA     NA     1    NA
 2 499874     NA     NA    NA     1
 2 499874     2      NA    NA     1 
  • How do I collapse the families retaining the phenotype information for x and y, when these are present?

The expected for family 499479 is:

zyg   fid    x_t1    x_t2  y_t1   y_t2
 1 499474     NA     1      1     1

and for family 499874, it should be:

 2 499874     2      NA    1     1 

Solution

  • You can use the following code:

    library(dplyr)
    
    df %>%
      group_by(fid) %>%
      summarise_all(~first(na.omit(.)))
    

    Output:

    # A tibble: 3 × 6
         fid   zyg  x_t1  x_t2  y_t1  y_t2
       <int> <int> <int> <int> <int> <int>
    1 499474     1    NA     1     1     1
    2 499540     1    NA    NA     1    NA
    3 499874     2     2    NA     1     1
    

    Your data:

    df<-structure(list(zyg = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L
    ), fid = c(499474L, 499474L, 499474L, 499474L, 499540L, 499540L, 
    499874L, 499874L, 499874L, 499874L), x_t1 = c(NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, 2L), x_t2 = c(1L, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA), y_t1 = c(1L, NA, NA, NA, 1L, NA, NA, 1L, NA, NA), 
        y_t2 = c(NA, NA, 1L, NA, NA, NA, NA, NA, 1L, 1L)), class = "data.frame", row.names = c(NA, 
    -10L))