I have a group of integers, as in this R
data.frame
:
set.seed(1)
df <- data.frame(id = paste0("id",1:100), length = as.integer(runif(100,10000,1000000)), stringsAsFactors = F)
So each element has an id
and a length
.
I'd like to split df into two data.frame
s with approximately equal sums of length
.
Any idea of an R
function to achieve that?
I thought that Hmisc
's cut2
might do it but I don't think that's its intended use:
library(Hmisc) # cut2
ll <- split(df, cut2(df$length, g=2))
> sum(ll[[1]]$length)
[1] 14702139
> sum(ll[[2]]$length)
[1] 37564671
It's called Bin pack problem. https://en.wikipedia.org/wiki/Bin_packing_problem this link may be helpful.
Using BBmisc::binPack
function,
df$bins <- binPack(df$length, sum(df$length)/2 + 1)
tapply(df$length, df$bins, sum)
results like
1 2 3
25019106 24994566 26346
Now since you want two groups,
dummy$bins[dummy$bins == 3] <- 2 #because labeled as 2's sum is smaller
result is
1 2
25019106 25020912