Suppose I have a dataset of serially numbered containers and their respective volumes.
x <- data.frame("SN" = 1:3, "Price" = c(10,20,30), "Volume" = c(100,150,200))
SN Price Volume
1 10 100
2 20 150
3 30 200
I am looking to fill the containers using a bucket of a given size.
Desired output with bucket_size = 200:
SN Price Volume
1 10 100 # max for SN 1 is 100, totally filled, bucket now = 100
2 20 100 # max for SN 2 is 150, bucket now = 0
2 20 50 # fill remaining SN 2, new bucket now = 150
3 30 150 # max for SN 3 is 200, bucket now = 0
3 30 50 # fill remaining in SN 3, bucket now = 150 remaining
I have started coding but it seems my code not generic enough to work for any bucket size.
x <- data.frame("SN" = 1:3, "Price" = c(10,20,30), "Volume" = c(100,150,200))
bucketsize <- 200
PendingBucketVolume <- bucketsize
y <- data.frame(SN = integer(),Price=numeric(),Volume=numeric(),stringsAsFactors=FALSE)
for (i in 1:nrow(x)) {
if (x$Volume[i] <= PendingBucketVolume) {
print(x$Volume[i])
PendingBucketVolume <- PendingBucketVolume - x$Volume[i]
} else {
print(PendingBucketVolume)
remainder <- x$Volume[i] - PendingBucketVolume
if (remainder <= bucketsize) {
print(remainder)
} else {
print(bucketsize)
remainder <- remainder - bucketsize
}
if (remainder < PendingBucketVolume) {
PendingBucketVolume <- remainder
} else {
PendingBucketVolume <- bucketsize
PendingBucketVolume <- PendingBucketVolume - remainder
}
}
}
Suggestions to make it generic and efficient.
I spent way too long trying to get if else
logic to work for this. There was too much balancing of row volumes and bucket volumes. Instead, I figured I could just break all the volumes out and assign them an ID, cbind
them, and then use table to bring them back together. The result is probably a much slower calc than the if else
method, but very simple to code.
x <- data.frame("SN" = 1:3, "Price" = c(10,20,30), "Volume" = c(100,150,200))
allocate_buckets <- function(x, bucketsize){
# assumption that X has the colnames
stopifnot(colnames(x) == c("SN","Price","Volume"))
row_num <- rep(x[,"SN"], x[,"Volume"])
l <- length(row_num)
bucket_num <- rep(1:ceiling(l/bucketsize), each = bucketsize)[1:l]
out <- table(row_num, bucket_num)
out.ind <- which(out !=0, arr.ind = T)
return(cbind.data.frame(x[out.ind[,1],c("SN","Price")], Volume = out[out.ind]))
}
So now you can use it for any (whole number) volume:
allocate_buckets(x, 200)
# SN Price Volume
#1 1 10 100
#2 2 20 100
#2.1 2 20 50
#3 3 30 150
#3.1 3 30 50
allocate_buckets(x, 67)
# SN Price Volume
#1 1 10 67
#1.1 1 10 33
#2 2 20 34
#2.1 2 20 67
#2.2 2 20 49
#3 3 30 18
#3.1 3 30 67
#3.2 3 30 67
#3.3 3 30 48
EDIT
Amazing link you posted, I was so close to this, here is the R version:
x <- data.frame("SN" = 1:3, "Price" = c(10,20,30), "Volume" = c(100,150,200))
y <- data.frame(SN = integer(), Price = numeric(), Volume = numeric())
bucket <- bucketsize <- 200
vol <- numeric()
count <- 0
for(i in 1:nrow(x)){
volume <- x[i,"Volume"]
while(volume!=0){
vol <- min(volume, bucket)
print(vol)
count <- count + 1
y[count,] <- x[i,]
y[count,"Volume"] <- vol
volume <- volume - vol
bucket <- bucket - vol
if(bucket == 0){
bucket <- bucketsize
}
}
}
EDIT 2 I ran a microbenchmark test (took a while) on the two methods and the result was that my original method actually seems faster compared to the code transcribed from SAS.
expr min lq mean median uq max neval
allocate_buckets(x, 200) 312.4177 466.6347 504.2121 483.1754 516.2977 846.4529 100
other(x, 200) 986.6495 1233.5141 1339.4219 1265.3606 1389.1158 2023.7884 100
This was unexpected to me. The benefit of the other method is that it can handle non-integer values. One could probably speed up the allocate_buckets function by using data.tables and the non-integer constraint could be lifted by multiplying by 100 or whatever number makes the smallest decimal become a whole number, and then dividing the result by 100 afterwards.