Search code examples
rrowdata-cleaning

How to combine rows in R to get most complete data in single row?


I have a large data frame that looks like this:

#a<- c('Bill', 'Bill', 'Jean', 'Jean', 'Jean', 'Louis')
#b<- c("A", "A", "B", NA, "B", NA)
#c<- c(0, 1, 0, 0, 0, 1)
#d<- c(1, 0, 1, 1, 0, 1)
#e<- c(1, 1, 0, 0, 1, 1)
#f<- data.frame(a, b, c, d, e)
#colnames(f)<- c("name", "letter", "red", "blue", "green")
 
   name letter red blue green
1  Bill      A   0    1     1
2  Bill      A   1    0     1
3  Jean      B   0    1     0
4  Jean   <NA>   0    1     0
5  Jean      B   0    0     1
6 Louis   <NA>   1    1     1

I would like to combine the rows so that each individual (i.e., Bill, Jean, and Louis) are only represented by one row. However, I would like to have the information in that one row be as complete as possible. I would like it to look like the following:

   name letter red blue green
1  Bill      A   1    1     1
2  Jean      B   0    1     1
3 Louis   <NA>   1    1     1

My only thought on how to achieve this so far involves a long loop where I check a bunch of conditions before deciding which element of each row to keep, but I was wondering if there's a faster way to do this with some preexisting function? There's no pattern to the missingness in the rows to exploit. Any guidance appreciated.


Solution

  • You can use summarise:

    library(dplyr)
    f %>% 
      group_by(name) %>% 
      summarise(across(everything(), max, na.rm = T))
    
      name  letter   red  blue green
      <chr> <chr>  <dbl> <dbl> <dbl>
    1 Bill  A          1     1     1
    2 Jean  B          0     1     1
    3 Louis NA         1     1     1