Search code examples
rrandomsampling

Random sampling to give an exact sum


I want to sample 140 numbers between 1000 to 100000 such that the sum of these 140 numbers is around 2 million (2000000):

sample(1000:100000,140)

such that:

sum(sample(1000:100000,140)) = 2000000

Any pointers how I can achieve this?


Solution

  • There exist an algorithm for generating such random numbers.

    Originally created for MATLAB, there is an R implementation of it:

    Surrogate::RandVec

    Citation from MATLAB script comment:

    %   This generates an n by m array x, each of whose m columns
    % contains n random values lying in the interval [a,b], but
    % subject to the condition that their sum be equal to s.  The
    % scalar value s must accordingly satisfy n*a <= s <= n*b.  The
    % distribution of values is uniform in the sense that it has the
    % conditional probability distribution of a uniform distribution
    % over the whole n-cube, given that the sum of the x's is s.
    %
    %   The scalar v, if requested, returns with the total
    % n-1 dimensional volume (content) of the subset satisfying
    % this condition.  Consequently if v, considered as a function
    % of s and divided by sqrt(n), is integrated with respect to s
    % from s = a to s = b, the result would necessarily be the
    % n-dimensional volume of the whole cube, namely (b-a)^n.
    %
    %   This algorithm does no "rejecting" on the sets of x's it
    % obtains.  It is designed to generate only those that satisfy all
    % the above conditions and to do so with a uniform distribution.
    % It accomplishes this by decomposing the space of all possible x
    % sets (columns) into n-1 dimensional simplexes.  (Line segments,
    % triangles, and tetrahedra, are one-, two-, and three-dimensional
    % examples of simplexes, respectively.)  It makes use of three
    % different sets of 'rand' variables, one to locate values
    % uniformly within each type of simplex, another to randomly
    % select representatives of each different type of simplex in
    % proportion to their volume, and a third to perform random
    % permutations to provide an even distribution of simplex choices
    % among like types.  For example, with n equal to 3 and s set at,
    % say, 40% of the way from a towards b, there will be 2 different
    % types of simplex, in this case triangles, each with its own
    % area, and 6 different versions of each from permutations, for
    % a total of 12 triangles, and these all fit together to form a
    % particular planar non-regular hexagon in 3 dimensions, with v
    % returned set equal to the hexagon's area.
    %
    % Roger Stafford - Jan. 19, 2006
    

    Example:

    test <- Surrogate::RandVec(a=1000, b=100000, s=2000000, n=140, m=1, Seed=sample(1:1000, size = 1))
    sum(test$RandVecOutput)
    # 2000000
    hist(test$RandVecOutput)