Search code examples
rprobabilitymissing-data

Why am I getting NAs in this calculation in R?


While working on an Rcpp program, I used the sample() function, which gave me the following error: "NAs not allowed in probability." I traced this issue to the fact that the probability vector I used had NA values in it. I have no idea how. Below is some R code that captures the errors:

n.0=20
n.1=20
n.reps=1
beta0.vals=rep(seq(-.3,.1,,n.0),n.reps)
beta1.vals=rep(seq(-7,0,,n.1),n.reps)
beta.grd=as.matrix(expand.grid(beta0.vals,beta1.vals))

n.rnd=200
beta.rnd.grd=cbind(runif(n.rnd,min(beta0.vals),max(beta0.vals)),runif(n.rnd,min(beta1.vals),max(beta1.vals)))
beta.grd=rbind(beta.grd,beta.rnd.grd)
  
N = 22670
count = 0

for(i in 1:dim(beta.grd)[1]){ # iterate through 600 possible beta values in beta grid
    
  beta.ind = 0 # indicator for current pair of beta values
    
  for(j in 1:N){ # iterate through all possible Nsums
    logit = beta.grd[i,1]/N*(j - .1*N)^2 + beta.grd[i,2];
    phi01 = exp(logit)/(1 + exp(logit))
      
    if(is.na(phi01)){ 
      count = count + 1
    }
  }
}

cat("Total number of invalid probabilities: ", count)

Here, $\beta_0 \in (-0.3, 0.1), \beta_1 \in (-7, 0), N = 22670, N_\text{sum} \in (1, N)$. Note that $N$ and $N_\text{sum}$ are integers, whereas the beta values may not be.

Since mathematically, $\phi_{01} \in (0,1)$, I'm assuming that NAs are arising because R is not liking extremely small values. I am receiving an overwhelming amount of NA values, too. More so than numbers. Why would I be getting NAs in this code?


Solution

  • Bernhard's answer correctly identifies the problem: If logit is large, exp(logit) = Inf. Here is a solution:

    for(i in 1:dim(beta.grd)[1]){ # iterate through 600 possible beta values in beta grid
        
        beta.ind = 0 # indicator for current pair of beta values
        
        for(j in 1:N){ # iterate through all possible Nsums
            logit = beta.grd[i,1]/N*(j - .1*N)^2 + beta.grd[i,2];
            ## This one isn't great because exp(logit) can be very large
            # phi01 = exp(logit)/(1 + exp(logit))
            ## So, we say instead
            ## phi01 = 1 / ( 1 + exp(-logit) )
            phi01 = plogis(logit)
            
            
            if(is.na(phi01)){ 
                count = count + 1
            }
        }
    }
    
    cat("Total number of invalid probabilities: ", count)
    # Total number of invalid probabilities:  0
    

    We can use the more stable 1 / (1 + exp(-logit) (to convince yourself of this, multiply your expression with exp(-logit) / exp(-logit)), and luckily either way, R has a builtin function plogis() that can calculate these probabilities quickly and accurately. You can see from the help file (?plogis) that this function evaluates the expression I gave, but you can also double check to assure yourself

    x = rnorm(1000)
    y = 1 / (1 + exp(-x))
    z = plogis(x)
    all.equal(y, z)
    [1] TRUE