I have next data table dataframe
library(dplyr)
library(data.table)
my_data = data.frame(
id = c(1, 1, 2, 2, 3),
sample_number = c('d1', 'rr1', 'd2', 'rr2', 'd3'),
res_1 = c('AA', NA, NA, 'GG', 'AG'),
res_2 = c(NA, 'TT', 'CC', NA, 'TC'),
res_3 = c('II', 'II', 'DD', 'ID', 'ID')
)
my_data <- my_data %>% as.data.table() ## convert to data table
> my_data
id sample_number res_1 res_2 res_3
1 1 d1 AA <NA> II
2 1 rr1 <NA> TT II
3 2 d2 <NA> CC DD
4 2 rr2 GG <NA> ID
5 3 d3 AG TC ID
Uniq column is id
. For some id
exists 2 rows with different values in sample_number
column. How can I unite rows by id column?
For id 2 in column res_3
mistake exists. In that case result of unite will by '---'. Result is next
id sample_number res_1 res_2 res_3
1 d1, rr1 AA TT II
2 d2, rr2 GG CC '---'
3 d3 AG TC ID
a data.table
approach
my_data[, sample_number := paste0(sample_number, collapse = ", "), by = .(id)]
DT <- melt(my_data, id.vars = c("id", "sample_number"), na.rm = TRUE)
dcast(DT, id + sample_number ~ variable, value.var = "value",
fun.aggregate = function(x) ifelse(length(unique(x)) > 1, "---", x))
# id sample_number res_1 res_2 res_3
# 1: 1 d1, rr1 AA TT II
# 2: 2 d2, rr2 GG CC ---
# 3: 3 d3 AG TC ID