If I want to sample numbers to create a vector I do:
set.seed(123)
x <- sample(1:100,200, replace = TRUE)
sum(x)
# [1] 10228
What if I want to sample 20 random numbers that sum to 100, and then 30 numbers but still sum to 100. This I imagine will be more of a challenge than it seems. ?sample
and searching Google has not provided me with a clue. And a loop to sample then reject if not close enough( e.g. within 5) of the desired sum I guess may take some time.
Is there a better way to achieve this?
an example would be:
foo(10,100) # ten random numbers that sum to 100. (not including zeros)
# 10,10,20,7,8,9,4,10,2,20
Here's another attempt. It doesn't use sample
, but uses runif
. I've added an optional "message" to the output showing the sum, which can be triggered using the showSum
argument. There is also a Tolerance
argument that specifies how close to the target is required.
SampleToSum <- function(Target = 100, VecLen = 10,
InRange = 1:100, Tolerance = 2,
showSum = TRUE) {
Res <- vector()
while ( TRUE ) {
Res <- round(diff(c(0, sort(runif(VecLen - 1)), 1)) * Target)
if ( all(Res > 0) &
all(Res >= min(InRange)) &
all(Res <= max(InRange)) &
abs((sum(Res) - Target)) <= Tolerance ) { break }
}
if (isTRUE(showSum)) cat("Total = ", sum(Res), "\n")
Res
}
Here are some examples.
Notice the difference between the default setting and setting Tolerance = 0
set.seed(1)
SampleToSum()
# Total = 101
# [1] 20 6 11 20 6 3 24 1 4 6
SampleToSum(Tolerance=0)
# Total = 100
# [1] 19 15 4 10 1 11 7 16 4 13
You can verify this behavior by using replicate
. Here's the result of setting Tolerance = 0
and running the function 5 times.
system.time(output <- replicate(5, SampleToSum(
Target = 1376,
VecLen = 13,
InRange = 10:200,
Tolerance = 0)))
# Total = 1376
# Total = 1376
# Total = 1376
# Total = 1376
# Total = 1376
# user system elapsed
# 0.144 0.000 0.145
output
# [,1] [,2] [,3] [,4] [,5]
# [1,] 29 46 11 43 171
# [2,] 103 161 113 195 197
# [3,] 145 134 91 131 147
# [4,] 154 173 138 19 17
# [5,] 197 62 173 11 87
# [6,] 101 142 87 173 99
# [7,] 168 61 97 40 121
# [8,] 140 121 99 135 117
# [9,] 46 78 31 200 79
# [10,] 140 168 146 17 56
# [11,] 21 146 117 182 85
# [12,] 63 30 180 179 78
# [13,] 69 54 93 51 122
And the same for setting Tolerance = 5
and running the function 5 times.
system.time(output <- replicate(5, SampleToSum(
Target = 1376,
VecLen = 13,
InRange = 10:200,
Tolerance = 5)))
# Total = 1375
# Total = 1376
# Total = 1374
# Total = 1374
# Total = 1376
# user system elapsed
# 0.060 0.000 0.058
output
# [,1] [,2] [,3] [,4] [,5]
# [1,] 65 190 103 15 47
# [2,] 160 95 98 196 183
# [3,] 178 169 134 15 26
# [4,] 49 53 186 48 41
# [5,] 104 81 161 171 180
# [6,] 54 126 67 130 182
# [7,] 34 131 49 113 76
# [8,] 17 21 107 62 95
# [9,] 151 136 132 195 169
# [10,] 194 187 91 163 22
# [11,] 23 69 54 97 30
# [12,] 190 14 134 43 150
# [13,] 156 104 58 126 175
Not surprisingly, setting the tolerance to 0 would make the function slower.
Note that since this is a "random" process, it's hard to guess how long it would take to find the right combination of numbers. For example, using set.seed(123)
, I ran the following test three times in a row:
system.time(SampleToSum(Target = 1163,
VecLen = 15,
InRange = 50:150))
The first run took just over 9 seconds. The second took just over 7.5 seconds. The third took... just under 381 seconds! That's a lot of variation!
Out of curiosity, I added a counter into the function, and the first run took 55026 attempts to arrive at a vector that satisfied all of our conditions! (I didn't bother trying for the second and third attempts.)
It might be good to add some error or sanity checking into the function to make sure the inputs are reasonable. For example, one should not be able to enter SampleToSum(Target = 100, VecLen = 10, InRange = 15:50)
since with a range of 15 to 50, there's no way to get to 100 AND have 10 values in your vector.