I would like to to sample values from a vector s<-0:1440
to create a vector u
so that the sum(u)=x
while length(u)<k
, for given k
and x
. Obviously k*max(s)>sum(u)
.
Is there any way to brute force simulate numerous such u
vectors? I would like to avoid messing with probability distributions (for the sampling) and I don't care if some u
vectors will get discarded.
EDIT: Regarding P Lapointe's nice comment about the length(u)
. It is important that the length(u)
should not be fixed (length(u)<k
) so that the vectors u
are of variable lengths. Another approach would be to fix the length(u)=k
, but the algorithm should be able to randomly insert (simulate) zeros in the u vectors. This would have the consequence that by adding a zero, the sum(u)
remains the same but the length(u)
increases by one (until length(u)=k
). It is important that the zeros appear randomly (not just at the end of the simulated vector, just to satisfy length(u)=k
)
OK, here's an algo that answers your question. Basically, we are doing two random samples. The first one to find a k that satisfies the length(u)<k
constraint. Using that k, we then use another sample to find k-1
numbers. This is called initial
in the algo. When we find a sample k-1
which is lower than x
, the desired sum, we add the difference of x-sum(initial)
to complete the series.
#Inputs
x <-2500 # desired sum
s1<-0:min(1440,x) #universe
max.k <-10
k <-sample(3:(max.k-1),1) #length(u)<k, starts at 3 because low k can be problematic
#with current inputs
initial <-x+1 #deliberately above limit to initialize the while
u <-s1+1 #deliberately above limit to initialize the while
while (sum(initial)>x | max(u)>max(s1)) {
initial <-sample(s1,k-1,replace=TRUE) #Find k-1 samples
u <-c(initial,x-sum(initial)) #add number that makes sum == x
}
#example
> k
[1] 4
> x
[1] 2500
> u
[1] 282 1337 876 5
> sum(u)
[1] 2500
Also, if you have a large max.k
, it might be a good thing to add a probability vector that gives more probability to low numbers in the sample. Otherwise, in the current example, it is tough to get a sum==2500 if you have several numbers above 1000.
prob1 <-1/((s1+1)*max.k ) #gives more probality to low numbers
while (sum(initial)>x | max(u)>max(s1)) {
initial <-sample(s1,k-1,replace=TRUE,prob=prob1) #Find k-1 samples
u <-c(initial,x-sum(initial)) #add number that makes sum == x
}