I am wondering what the best way to solve this is. Essentially I want to generate 20 samples which add to 100 but also where (x1+x2>20). I am struggling to get something that is fast and efficient. I realise that I could filter out the lines that don't meet this criteria but it isn't efficient if I generate 10,000 rather than 20.
The code is as below:
n = 20
x1 = sample(0:100,n,replace = TRUE)
x2 = sample(0:100,n,replace = TRUE)
x3 = sample(0:100,n,replace = TRUE)
index = (x1+x2+x3)>100
G=(x1+x2)>20
while(sum(index)>0&&sum(G)>0){
x1[index&&G] = sample(0:100,n,replace = TRUE)
x2[index&&G] = sample(0:100,n,replace = TRUE)
x3[index&&G] = sample(0:100,n,replace = TRUE)
index =(x1+x2+x3)>100
G=(x1+x2)>20
}
x4=rep(100,n)-x1-x2-x3
df <- data.frame(x1,x2,x3,x4)
Thanks in advance.
Here is an unbiased way to pick k numbers in the range 0:n which sum to n. It is based on the stars and bars encoding:
#picks k random numbers in range 0:n which sum to n:
pick <- function(k,n){
m <- n + k - 1 #number of stars and bars
bars <- sort(sample(1:m,k-1)) #positions of the bars
c(bars,m+1)-c(0,bars)-1
}
This generates a single example, returning a vector. As @Guillaume Devailly observes in their answer, most of the samples will satisfy the additional constraint on the sum of the first 2 numbers, so you can just filter out those that don't.
Note that if you want 4 numbers in the range 1:100 which sum to 100 you could just use 1 + pick(4,96)
.
To enforce the constraint on the first two numbers:
pick.sample <- function(){
while(TRUE){
x <- pick(4,100)
if(sum(x[1:2]) >20) return(x)
}
}
Then
df <- data.frame(t(replicate(10000,pick.sample())))
will create a 10,000 row dataframe where each row is a sample which satisfies the constraints.