Search code examples
rmatrixsampleapply

Ignoring values or NAs in the sample function


I have a matrix in R that I would like to take a single random sample from each row. Some of my data is in NA, but when taking the random sample I do not want the NA to be an option for the sampling. How would I accomplish this?

For example,

a <- matrix (c(rep(5, 10), rep(10, 10), rep(NA, 5)), ncol=5, nrow=5)
a
     [,1] [,2] [,3] [,4] [,5]
[1,]    5    5   10   10   NA
[2,]    5    5   10   10   NA
[3,]    5    5   10   10   NA
[4,]    5    5   10   10   NA
[5,]    5    5   10   10   NA

When I apply the sample function to this matrix to output another matrix I get

b <- matrix(apply(a, 1, sample, size=1), ncol=1)
b

     [,1]
[1,]   NA
[2,]   NA
[3,]   10
[4,]   10
[5,]    5

Instead I do not want the NA to be capable of being the output and want the output to be something like:

b
     [,1]
[1,]   10
[2,]   10
[3,]   10
[4,]    5
[5,]   10

Solution

  • There might be a better way but sample doesn't appear to have any parameters related to NAs so instead I just wrote an anonymous function to deal with the NAs.

    apply(a, 1, function(x){sample(x[!is.na(x)], size = 1)})
    

    essentially does what you want. If you really want the matrix output you could do

    b <- matrix(apply(a, 1, function(x){sample(x[!is.na(x)], size = 1)}), ncol = 1)
    

    Edit: You didn't ask for this but my proposed solution does fail in certain cases (mainly if a row contains ONLY NAs.

    a <- matrix (c(rep(5, 10), rep(10, 10), rep(NA, 5)), ncol=5, nrow=5)
    # My solution works fine with your example data
    apply(a, 1, function(x){sample(x[!is.na(x)], size = 1)})
    
    # What happens if a row contains only NAs
    a[1,] <- NA
    
    # Now it doesn't work
    apply(a, 1, function(x){sample(x[!is.na(x)], size = 1)})
    
    # We can rewrite the function to deal with that case
    mysample <- function(x, ...){
        if(all(is.na(x))){
            return(NA)
        }
        return(sample(x[!is.na(x)], ...))
    }
    
    # Using the new function things work.
    apply(a, 1, mysample, size = 1)