Search code examples
rdataframedata.tablecorrelationcombn

combn on grouped DT returns error "n < m"


I have a data set with 31557 observations, and the variables Order.number and Materials. I'm trying to run this in R:

First:

DT <- data.table(Order.number=X$Order.number, Materials=X$Materials)
setorder(DT, Order.number, Materials)

Then:

library(data.table)    
ans <- DT[, as.data.table(do.call(rbind, combn(Materials, 2, simplify=FALSE))), 
      by=Order.number][,
                       .N, by=.(V1, V2)]

But I get the error in combn(Materials, 2, simplify = FALSE) : n < m

It works if I just use random generated table. So could it be something to do with the dataset I have?

EDIT: I tried with meaning of combn error, but getting "Error in do.call(rbind, function(x) if (length(x) > 1) { : second argument must be a list"

ans <- DT[, as.data.table(do.call(rbind, function(x)
  if(length(x)>1) {
    combn(Materials, 2, simplify=FALSE)
  }
  else x)), 
  by=Order.number][,
  .N, by=.(V1, V2)]

Solution

  • Clearly you have some value of grouping variable Order.number in your DT giving a group of length 1 or less, hence combn(Materials, 2...) complains that n < m.

    You can easily diagnose which group has length 1 with DT[, .N, by=Order.number] [N==1].

    Then either exclude those from your summary, or write a wrapper for combn that does nothing when the input length n < m.

    (Arguably combn should have an enhance non-default option to selectively squelch the error, when applied to groups of length n < 2, as is likely to happen when run on a grouped df/dt)