Search code examples
rrandominteger-division

How to randomly divide an integer into a fixed number of integers, such that the obtained tuples are uniformly distributed?


Based on this reply: Random numbers that add to 100: Matlab I tried to apply the suggested method to randomly divide an integer into a fixed number of integers whose sum is equal to the integer. Although that method seems to result in a uniformly distributed set of points when the values are not integers, in the case of integers, the resulting tuples are not obtained with equal probability. This is shown by the following implementation in R, where a simple case is tested with 3 divisors and with the integer to be divided equal to 5:

# Randomly divide an integer into a defined number of integers 
# Goal: obtain with equal probability any combination of variable values, with the condition that sum(variables) = dividend.

# install.packages(rgl)  # Install rgl package if not yet installed. This allows to use the plot3d function to create a 3D scatterplot.
library(rgl)

n_draws = 10000
n_variables = 3  # Number of divisors. These need to be randomly calculated. Their value must be in the interval [0:dividend] and their sum must be equal to the dividend. Two variables can have the same value.
dividend = 5  # Number that needs to be divided.
rand_variables = matrix(nrow = n_draws, ncol = n_variables)  # This matrix contains the final values for each variable (one column per variable).
rand_samples = matrix(nrow = n_draws, ncol = n_variables-1)  # This matrix contains the intermediate values that are used to randomly divide the dividend.

for (k in 1:n_draws){
    rand_samples[k,] = sample(x = c(0:dividend), size = n_variables-1, replace = TRUE)  # Randomly select (n_variables - 1) values within the range 0:dividend. The values in rand_samples are uniformly distributed.
    midpoints = sort(rand_samples[k,])
    rand_variables[k,] = sample(diff(c(0, midpoints, dividend)), n_variables)  # Calculate the values of each variable such that their sum is equal to the dividend.
}

plot3d(rand_variables)  # Create a 3D scatterplot showing the values of rand_variables. This plot does not show how frequently each combination of values of the n_variables is obtained, only which combinations of values are possible.

table(data.frame(rand_variables))  # This prints out the count of each combination of values of n_variables. It shows that the combinations of values in the corners (e.g. (5,0,0)) are obtained less frequently than other combinations (e.g. (1,2,2)).

The last line gives the following output, which shows how many times were obtained each combination of values of (X1, X2, X3) that respect the condition X1 + X2 + X3 = 5:

, , X3 = 0

   X2
X1    0   1   2   3   4   5
  0   0   0   0   0   0 397
  1   0   0   0   0 471   0
  2   0   0   0 469   0   0
  3   0   0 446   0   0   0
  4   0 456   0   0   0   0
  5 358   0   0   0   0   0

, , X3 = 1

   X2
X1    0   1   2   3   4   5
  0   0   0   0   0 450   0
  1   0   0   0 539   0   0
  2   0   0 560   0   0   0
  3   0 588   0   0   0   0
  4 426   0   0   0   0   0
  5   0   0   0   0   0   0

, , X3 = 2

   X2
X1    0   1   2   3   4   5
  0   0   0   0 428   0   0
  1   0   0 603   0   0   0
  2   0 549   0   0   0   0
  3 461   0   0   0   0   0
  4   0   0   0   0   0   0
  5   0   0   0   0   0   0

, , X3 = 3

   X2
X1    0   1   2   3   4   5
  0   0   0 500   0   0   0
  1   0 549   0   0   0   0
  2 455   0   0   0   0   0
  3   0   0   0   0   0   0
  4   0   0   0   0   0   0
  5   0   0   0   0   0   0

, , X3 = 4

   X2
X1    0   1   2   3   4   5
  0   0 465   0   0   0   0
  1 458   0   0   0   0   0
  2   0   0   0   0   0   0
  3   0   0   0   0   0   0
  4   0   0   0   0   0   0
  5   0   0   0   0   0   0

, , X3 = 5

   X2
X1    0   1   2   3   4   5
  0 372   0   0   0   0   0
  1   0   0   0   0   0   0
  2   0   0   0   0   0   0
  3   0   0   0   0   0   0
  4   0   0   0   0   0   0
  5   0   0   0   0   0   0

As the output shows, the combinations of values in the corners of the plane (e.g. (5,0,0)) are obtained less frequently than other tuples.

How can I obtain any integer tuple with the same probability?

I'm looking for a solution that is applicable for any positive integer and for any number of divisors.


Solution

  • I think trying to make these combinations/permutations manually is reinventing the wheel. There are efficient algorithms to do this implemented in partitions. For example,

    library(partitions)                            # compositions, parts, restrictedparts may be of interest
    sample_size <- 1000
    pool <- compositions(5, 3)                     # pool of possible tuples
    samp <- sample(ncol(pool), sample_size, TRUE)  # sample uniformly
    
    ## These are you sampled tuples, each column
    z <- matrix(pool[,samp], 3)
    

    Side note: don't use a data.frame, use a matrix to store a set of integers. data.frames will be entirely copied every time you modify something ([.data.frame is not a primitive), whereas matrices will modify in place.