I have duplicate rows in a data.table - see below. A couple of the variables are lists.
ID l29 s14 s2 s7 s71 s91
12345 NULL 3 NULL NULL NULL NULL
12345 NULL NULL 1 NULL NULL NULL
12345 NULL NULL NULL 2 NULL NULL
12345 NULL NULL NULL NULL c(4.5, 8, 9, 10) NULL
12345 NULL NULL NULL NULL c(4.5, 8, 9, 10) NULL
12345 NULL NULL NULL NULL c(4.5, 8, 9, 10) NULL
12345 NULL NULL NULL NULL NULL c(6, 7)
12345 11 NULL NULL NULL NULL NULL
12345 NULL NULL NULL NULL c(4.5, 8, 9, 10) NULL
12345 NULL NULL NULL NULL NULL c(6, 7)
I would like to summarize the above into one row as below:
12345 11 3 1 2 c(4.5, 8, 9, 10) c(6, 7)
Someone asked me to dput the data, I hope this does it?
structure(list(SSN_TAX_ID = c("12345", "12345", "12345", "12345",
"12345", "12345", "12345", "12345", "12345", "12345"), L29 = list(
NULL, NULL, NULL, NULL, NULL, NULL, NULL, 11, NULL, NULL),
S14 = list(3, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
NULL), S2 = list(NULL, 1, NULL, NULL, NULL, NULL, NULL,
NULL, NULL, NULL), S7 = list(NULL, NULL, 2, NULL, NULL,
NULL, NULL, NULL, NULL, NULL), S71 = list(NULL, NULL,
NULL, c(4.5, 8, 9, 10), c(4.5, 8, 9, 10), c(4.5, 8, 9,
10), NULL, NULL, c(4.5, 8, 9, 10), NULL), S91 = list(
NULL, NULL, NULL, NULL, NULL, NULL, c(6, 7), NULL, NULL,
c(6, 7))), .Names = c("SSN_TAX_ID", "L29", "S14", "S2",
"S7", "S71", "S91"), class = "data.frame", row.names = c(NA,
-10L))
This is pretty messy, but:
setDT(DT) # since the OP posted a data frame
DT[, lapply(.SD, function(x) unique(Filter(Negate(is.null), x)))]
# SSN_TAX_ID L29 S14 S2 S7 S71 S91
# 1: 12345 11 3 1 2 4.5, 8.0, 9.0,10.0 6,7
Filter(Negate(is.null), x)
drops the NULLs from each column before we uniquify.
In the special case of the OP's example, using Reduce
with union
also happens to work:
DT[, lapply(.SD, function(x) .(Reduce(union, x)))]