I have a data set with 31557 observations, and the variables Order.number and Materials. I'm trying to run this in R:
First:
DT <- data.table(Order.number=X$Order.number, Materials=X$Materials)
setorder(DT, Order.number, Materials)
Then:
library(data.table)
ans <- DT[, as.data.table(do.call(rbind, combn(Materials, 2, simplify=FALSE))),
by=Order.number][,
.N, by=.(V1, V2)]
But I get the error in combn(Materials, 2, simplify = FALSE) : n < m
It works if I just use random generated table. So could it be something to do with the dataset I have?
EDIT: I tried with meaning of combn error, but getting "Error in do.call(rbind, function(x) if (length(x) > 1) { : second argument must be a list"
ans <- DT[, as.data.table(do.call(rbind, function(x)
if(length(x)>1) {
combn(Materials, 2, simplify=FALSE)
}
else x)),
by=Order.number][,
.N, by=.(V1, V2)]
Clearly you have some value of grouping variable Order.number
in your DT giving a group of length 1 or less, hence combn(Materials, 2...)
complains that n < m.
You can easily diagnose which group has length 1 with DT[, .N, by=Order.number] [N==1]
.
Then either exclude those from your summary, or write a wrapper for combn that does nothing when the input length n < m.
(Arguably combn
should have an enhance non-default option to selectively squelch the error, when applied to groups of length n < 2, as is likely to happen when run on a grouped df/dt)