Search code examples
rdatatable

R data table how to unite rows with error check


I have next data table dataframe

library(dplyr)
library(data.table)

my_data = data.frame(
  id = c(1, 1, 2, 2, 3),
  sample_number = c('d1', 'rr1', 'd2', 'rr2', 'd3'),
  res_1 = c('AA', NA, NA, 'GG', 'AG'),
  res_2 = c(NA, 'TT', 'CC', NA, 'TC'),
  res_3 = c('II', 'II', 'DD', 'ID', 'ID')
)
my_data <- my_data %>% as.data.table() ## convert to data table
> my_data
  id sample_number res_1 res_2 res_3
1  1            d1    AA   <NA>   II
2  1           rr1   <NA>   TT    II
3  2            d2   <NA>   CC    DD
4  2           rr2    GG   <NA>   ID
5  3            d3    AG    TC    ID

Uniq column is id. For some id exists 2 rows with different values in sample_number column. How can I unite rows by id column? For id 2 in column res_3 mistake exists. In that case result of unite will by '---'. Result is next

id  sample_number   res_1 res_2   res_3
1   d1, rr1         AA    TT      II
2   d2, rr2         GG    CC      '---'
3   d3              AG    TC      ID

Solution

  • a data.table approach

    my_data[, sample_number := paste0(sample_number, collapse = ", "), by = .(id)]
    DT <- melt(my_data, id.vars = c("id", "sample_number"), na.rm = TRUE)
    dcast(DT, id + sample_number ~ variable, value.var = "value", 
          fun.aggregate = function(x) ifelse(length(unique(x)) > 1, "---", x))
    #    id sample_number res_1 res_2 res_3
    # 1:  1       d1, rr1    AA    TT    II
    # 2:  2       d2, rr2    GG    CC   ---
    # 3:  3            d3    AG    TC    ID