Search code examples
rplotarithmetic-expressionslargenumbercdf

How to handle arithmetic operations with extremely large numbers?


I am trying to find the cumulative distribution function of a particular type of order statistics; The progressively censored uniform order statistics. I have produced the following R code:

#The CDF for rth order progressively censored uniform order statistics
n <- 30 #Total number of experimental units
m <- 15 #Desired numbers of failure
R <- c(rep(0, m - 1), n - m) #Progressive censoring scheme
order <- m #Order of the censored order statistics (Here, maximum order)

gam <- NA
Cr <- NA
for(i in 1 : order)
{
  gam[i] <- m - i + 1 + sum(R[i : m])
}
for(i in 1 : order)
{
  Cr[i] <- prod(gam[1 : i])
}
air <- array(dim = c(order, order))

for(i in 1 : order)
{
  for (j in 1 : order) {
    
    if(i != j)
    {
      air[i, j] <- 1/(gam[j] - gam[i])
    }
  }
}
A <- NA
for(i in 1 : order)
{
  A[i] = prod(na.omit(air[i,]))
}
#CDF of progressively censored uniform order statistics
progU_CDF <- function(u)
{
  CDF = NA
  for(i in 1 : length(u))
  {
    CDF[i] <- 1 - (Cr[order] * sum((A/gam) * ((1 - u[i])^(gam))))
  }
  return(CDF)
}

Now, progU_CDF(0) should give 0 and progU_CDF(1) should give 1. Although, in this case, progU_CDF(0) is producing number very close to 0, not exactly 0. Mathematically, the idea is that when u = 0, Cr[order] * sum(A/gam) = 1.

Moreover, when I'm plotting the CDF, the shape is desirable, i.e. monotonically non-decreasing.

plot(seq(0, 1, 0.01), progU_CDF(seq(0, 1, 0.01)), type = "l")

Order Statistics CDF plot n = 30, m = 15

However, things start to go crazy when I take n = 50 and m = 25. Nothing is changed, but Cr[order] * sum(A/gam) is nowhere close to 1 when u = 0. And the CDF plot looks like this:

Order Statistics CDF plot n = 50, m = 25

I suspect this is caused due to the arithmatic operation on extremely large numbers. But I'm unable to trace it.

What's more confusing is that Cr[order] * sum(A/gam) and sum(Cr[order] * A/gam) are producing two different numbers, which are counterintuitive, since Cr[order] is a constant.

My question is, why is it working for n = 30, m = 15 but not for n = 50, m = 25? Is there any way to deal with such large numbers so that Cr[order] * sum(A/gam) is always close to 1 whenever u = 0, no matter the value of n and m?


Solution

  • A results in a vector of very small numbers of alternating sign. The errors compound so quickly that performing the calculations in log scale still results in large errors. Multiple precision with Rmpfr seems to be the way to go.

    Vectorized with Rmpfr:

    library(Rmpfr)
    
    fprogU_CDF <- function(n, m, precBits = 128) {
      R <- c(numeric(m - 1), n - m) #Progressive censoring scheme
      order <- m #Order of the censored order statistics (Here, maximum order)
      
      gam <- mpfr(rev(cumsum(R[m:order])) + m:(m - order + 1), precBits)
      Cr <- cumprod(gam[1:order])
      air <- 1/outer(gam, gam, "-")
      diag(air) <- 1
      A <- apply(air, 1, prod)
      diag(air) <- NA
      #CDF of progressively censored uniform order statistics
      function(u) {
        as.numeric(1 - Cr[order]*colSums(A/gam*outer(gam, 1 - u, \(x, y) y^x)))
      }
    }
    

    Testing with n = 30 and m = 15.

    progU_CDF <- fprogU_CDF(30, 15)
    progU_CDF(0)
    #> [1] 2.15461e-27
    progU_CDF(1)
    #> [1] 1
    curve(progU_CDF(x), 0, 1)
    

    enter image description here

    Testing with n = 50 and m = 25.

    progU_CDF <- fprogU_CDF(50, 25, 1024)
    progU_CDF(0)
    #> [1] 4.050669e-288
    progU_CDF(1)
    #> [1] 1
    curve(progU_CDF(x), 0, 1)
    

    enter image description here


    Alternative function

    A more streamlined alternative function:

    fprogU_CDF <- function(n, m, precBits = 128) {
      d <- mpfr((n - m - 1):0, precBits)
      g <- mpfr(n:(m + 1), precBits)
      C <- factorial(mpfr(n, precBits))/factorial(mpfr(m, precBits))/factorial(d)/
        factorial(rev(d))*rep(c(1L, -1L), length.out = n - m)/g
      #CDF of progressively censored uniform order statistics
      function(u) as.numeric(1 - colSums(C*outer(g, 1 - u, \(x, y) y^x)))
    }