Search code examples
rbucket

Filling containers by volume given a bucket size


Suppose I have a dataset of serially numbered containers and their respective volumes.

x <- data.frame("SN" = 1:3, "Price" = c(10,20,30), "Volume" = c(100,150,200))

SN     Price      Volume
1      10         100
2      20         150
3      30         200

I am looking to fill the containers using a bucket of a given size.

  1. If the container is filled before the bucket is emptied, I'd like to move to the next SN.
  2. If the bucket is emptied before the container is filled, I'd like to start a new row with the remaining container.

Desired output with bucket_size = 200:

 SN     Price      Volume
    1      10         100 # max for SN 1 is 100, totally filled, bucket now = 100
    2      20         100 # max for SN 2 is 150, bucket now = 0 
    2      20          50 # fill remaining SN 2, new bucket now = 150 
    3      30         150 # max for SN 3 is 200, bucket now = 0
    3      30          50 # fill remaining in SN 3, bucket now = 150 remaining

I have started coding but it seems my code not generic enough to work for any bucket size.

x <- data.frame("SN" = 1:3, "Price" = c(10,20,30), "Volume" = c(100,150,200))

bucketsize <- 200
PendingBucketVolume <- bucketsize

y <- data.frame(SN = integer(),Price=numeric(),Volume=numeric(),stringsAsFactors=FALSE)

for (i in 1:nrow(x)) {
  if (x$Volume[i] <= PendingBucketVolume) {
    print(x$Volume[i])
    PendingBucketVolume <- PendingBucketVolume - x$Volume[i]
  } else {
    print(PendingBucketVolume)
    remainder <- x$Volume[i] - PendingBucketVolume
    if (remainder <= bucketsize) {
      print(remainder)
    } else {
      print(bucketsize)
      remainder <- remainder - bucketsize

    }

    if (remainder < PendingBucketVolume) {
      PendingBucketVolume <- remainder
    } else {
      PendingBucketVolume <- bucketsize
      PendingBucketVolume <- PendingBucketVolume - remainder
    }

  }
}

Suggestions to make it generic and efficient.


Solution

  • I spent way too long trying to get if else logic to work for this. There was too much balancing of row volumes and bucket volumes. Instead, I figured I could just break all the volumes out and assign them an ID, cbind them, and then use table to bring them back together. The result is probably a much slower calc than the if else method, but very simple to code.

    x <- data.frame("SN" = 1:3, "Price" = c(10,20,30), "Volume" = c(100,150,200))
    
    allocate_buckets <- function(x, bucketsize){
      # assumption that X has the colnames
      stopifnot(colnames(x) == c("SN","Price","Volume"))
      row_num <- rep(x[,"SN"], x[,"Volume"])
      l <- length(row_num)
      bucket_num <- rep(1:ceiling(l/bucketsize), each = bucketsize)[1:l]
      out <- table(row_num, bucket_num)
      out.ind <- which(out !=0, arr.ind = T)
      return(cbind.data.frame(x[out.ind[,1],c("SN","Price")], Volume = out[out.ind]))
    }
    

    So now you can use it for any (whole number) volume:

    allocate_buckets(x, 200)
    #    SN Price Volume
    #1    1    10    100
    #2    2    20    100
    #2.1  2    20     50
    #3    3    30    150
    #3.1  3    30     50
    
    allocate_buckets(x, 67)
    #    SN Price Volume
    #1    1    10     67
    #1.1  1    10     33
    #2    2    20     34
    #2.1  2    20     67
    #2.2  2    20     49
    #3    3    30     18
    #3.1  3    30     67
    #3.2  3    30     67
    #3.3  3    30     48
    

    EDIT

    Amazing link you posted, I was so close to this, here is the R version:

    x <- data.frame("SN" = 1:3, "Price" = c(10,20,30), "Volume" = c(100,150,200))
    y <- data.frame(SN = integer(), Price = numeric(), Volume = numeric())
    bucket <- bucketsize <- 200
    vol <- numeric()
    count <- 0
    for(i in 1:nrow(x)){
      volume <- x[i,"Volume"]
      while(volume!=0){
        vol <- min(volume, bucket)
        print(vol)
        count <- count + 1
        y[count,] <- x[i,]
        y[count,"Volume"] <- vol
        volume <- volume - vol
        bucket <- bucket - vol
        if(bucket == 0){
          bucket <- bucketsize
        }
      }
    }
    

    EDIT 2 I ran a microbenchmark test (took a while) on the two methods and the result was that my original method actually seems faster compared to the code transcribed from SAS.

                         expr      min        lq      mean    median        uq       max neval
     allocate_buckets(x, 200) 312.4177  466.6347  504.2121  483.1754  516.2977  846.4529   100
                other(x, 200) 986.6495 1233.5141 1339.4219 1265.3606 1389.1158 2023.7884   100
    

    This was unexpected to me. The benefit of the other method is that it can handle non-integer values. One could probably speed up the allocate_buckets function by using data.tables and the non-integer constraint could be lifted by multiplying by 100 or whatever number makes the smallest decimal become a whole number, and then dividing the result by 100 afterwards.