Search code examples
rfloating-pointprecisionrcpp

Rcpp - How to compute a matrix where rowSums is exactly 1


I am trying to create a matrix with random numbers where the rowSums should exactly be 1.

I already have a condition which checks if the rowSums is not 1 and tries to correct it.

When I print out the result it looks correct but if I test if all values are 1 it gives me some FALSE values.

How can I correct that?

library(Rcpp)

cppFunction('
NumericMatrix imembrandc(int n, int k) {
  NumericMatrix u( n , k );
  IntegerVector sequ = seq(1,100);
  NumericVector sampled;
  for (int i=0; i < k; ++i) {
    sampled = sample(sequ, n);
    u(_,i) = sampled / sum(sampled);
  }

  if (is_true(any(rowSums(u) != 1))) {
    u(_,1) = u(_,1) + (1 - rowSums(u));
  }

  return(u);
}')

When I print out the rowSums of the result it looks correct:

res = imembrandc(n = 10, k = 5)
rowSums(res)

[1] 1 1 1 1 1 1 1 1 1 1

But checking it gives some FALSEs:

rowSums(res) == 1

[1] TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE FALSE TRUE


Solution

  • The canonical way to generate n random numbers that sum to 1 is to generate n - 1 values from [0,1), add 0 and 1 to the list and take the difference of the sorted list. Of course, this depends on the distribution you want for the random numbers. This can be expressed in R as

    set.seed(42)
    v <- diff(sort(c(0, runif(5), 1)))
    v
    #> [1] 0.28613953 0.35560598 0.18870211 0.08435842 0.02226937 0.06292459
    sum(v)
    #> [1] 1
    

    Created on 2019-05-24 by the reprex package (v0.2.1)

    In your case in C++:

    #include <Rcpp.h>
    using namespace Rcpp;
    
    // [[Rcpp::export]]
    NumericMatrix imembrandc(int n, int k) {
      NumericMatrix u(n, k);
      for (int i = 0; i < n; ++i) {
        NumericVector row = runif(k - 1);
        row.push_back(0.0);
        row.push_back(1.0);
        u(i, _) = diff(row.sort());
      }
      return u;
    }
    
    /*** R
    set.seed(42)
    res = imembrandc(n = 10, k = 5)
    rowSums(res)
    rowSums(res) == 1
    all.equal(rowSums(res),rep(1, nrow(res)))
    */
    

    Note that I am generating rows to begin with, while you were generating columns and then tried to correct the rowSum. Output:

    > set.seed(42)
    
    > res = imembrandc(n = 10, k = 5)
    
    > rowSums(res)
     [1] 1 1 1 1 1 1 1 1 1 1
    
    > rowSums(res) == 1
     [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
    
    > all.equal(rowSums(res),rep(1, nrow(res)))
    [1] TRUE
    

    BTW, all.equal gives TRUE also for your matrix, since the difference is really small. But I find it better to avoid the problem from the beginning.