Search code examples
rloopsnasamplemapply

Ignore NA values in a sample function in R


I am trying to ignore NA values in a sampling function. This relates to an issue earlier regarding sampling using start and end values within a loop in R: Sample using start and end values within a loop in R.

I found a solution to the issue using mapply : df[j,4] <- mapply(function(x, y) sample(seq(x, y), 1), df[j,"start"], df[j,"end"]). I've returned to this issue, but I am having some difficulties in dealing with NA values. Normally I would just try to filter out rows with NA values in the start and end columns, but other portions of the loop refer to the rows that would be removed. I've checked other threads discussing using na.omit or na.rm as possible solutions, but as I said, filtering out rows with NA values causes other issues in my code and I don't think that sample has a na.rm argument, so I'm trying to see if there is another workaround.

I've used the same data set as my previous question but with a few NA values added in. I'd like to end up with something like this below:

ID  start  end  sampled
a   25     67   44
b   36     97   67
c   23     85   77
d   15     67   52
e   21     52   41
f   NA     NA   NA
g   39     55   49
h   27     62   35
i   11     99   17
j   21     89   66
k   NA     NA   NA
l   44     58   48
m   16     77   22
n   25     88   65

Here's a sample set to use:

structure(list(ID = c("a", "b", "c", "d", "e", "f", "g", "h", 
"i", "j", "k", "l", "m", "n"), start = c(25, 36, 23, 15, 21, 
NA, 39, 27, 11, 21, NA, 44, 16, 25), end = c(67, 97, 85, 67, 
52, NA, 55, 62, 99, 89, NA, 58, 77, 88), sampled = c(NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)), class = c("spec_tbl_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L), spec = structure(list(
    cols = list(ID = structure(list(), class = c("collector_character", 
    "collector")), start = structure(list(), class = c("collector_double", 
    "collector")), end = structure(list(), class = c("collector_double", 
    "collector")), sampled = structure(list(), class = c("collector_logical", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
    "collector")), skip = 1), class = "col_spec"))

Solution

  • A simple way would be to check for NA values in mapply :

    df$sampled <- mapply(function(x, y) if(is.na(x) || is.na(y)) NA else 
                                        sample(seq(x, y), 1), df$start, df$end)
    

    Or since this is a part of larger loop using j to index rows :

    df[j,4] <- mapply(function(x, y) if(is.na(x) || is.na(y)) NA else 
                      sample(seq(x, y), 1), df[j,"start"], df[j,"end"])