Search code examples
rsplitcutpartition

Split a group of integers into two subgroups of approximately the same suns


I have a group of integers, as in this R data.frame:

set.seed(1)
df <- data.frame(id = paste0("id",1:100), length = as.integer(runif(100,10000,1000000)), stringsAsFactors = F)

So each element has an id and a length.

I'd like to split df into two data.frames with approximately equal sums of length.

Any idea of an R function to achieve that?

I thought that Hmisc's cut2 might do it but I don't think that's its intended use:

library(Hmisc) # cut2
ll <- split(df, cut2(df$length, g=2))
> sum(ll[[1]]$length)
[1] 14702139
> sum(ll[[2]]$length)
[1] 37564671

Solution

  • It's called Bin pack problem. https://en.wikipedia.org/wiki/Bin_packing_problem this link may be helpful.

    Using BBmisc::binPack function,

    df$bins <- binPack(df$length, sum(df$length)/2 + 1)
    tapply(df$length, df$bins, sum)
    

    results like

           1        2        3 
    25019106 24994566    26346 
    

    Now since you want two groups,

    dummy$bins[dummy$bins == 3] <- 2 #because labeled as 2's sum is smaller
    

    result is

           1        2 
    25019106 25020912