I am trying to ignore NA values in a sampling function. This relates to an issue earlier regarding sampling using start and end values within a loop in R: Sample using start and end values within a loop in R.
I found a solution to the issue using mapply
: df[j,4] <- mapply(function(x, y) sample(seq(x, y), 1), df[j,"start"], df[j,"end"])
. I've returned to this issue, but I am having some difficulties in dealing with NA
values. Normally I would just try to filter out rows with NA
values in the start
and end
columns, but other portions of the loop refer to the rows that would be removed. I've checked other threads discussing using na.omit
or na.rm
as possible solutions, but as I said, filtering out rows with NA
values causes other issues in my code and I don't think that sample
has a na.rm
argument, so I'm trying to see if there is another workaround.
I've used the same data set as my previous question but with a few NA
values added in.
I'd like to end up with something like this below:
ID start end sampled
a 25 67 44
b 36 97 67
c 23 85 77
d 15 67 52
e 21 52 41
f NA NA NA
g 39 55 49
h 27 62 35
i 11 99 17
j 21 89 66
k NA NA NA
l 44 58 48
m 16 77 22
n 25 88 65
Here's a sample set to use:
structure(list(ID = c("a", "b", "c", "d", "e", "f", "g", "h",
"i", "j", "k", "l", "m", "n"), start = c(25, 36, 23, 15, 21,
NA, 39, 27, 11, 21, NA, 44, 16, 25), end = c(67, 97, 85, 67,
52, NA, 55, 62, 99, 89, NA, 58, 77, 88), sampled = c(NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L), spec = structure(list(
cols = list(ID = structure(list(), class = c("collector_character",
"collector")), start = structure(list(), class = c("collector_double",
"collector")), end = structure(list(), class = c("collector_double",
"collector")), sampled = structure(list(), class = c("collector_logical",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1), class = "col_spec"))
A simple way would be to check for NA
values in mapply
:
df$sampled <- mapply(function(x, y) if(is.na(x) || is.na(y)) NA else
sample(seq(x, y), 1), df$start, df$end)
Or since this is a part of larger loop using j
to index rows :
df[j,4] <- mapply(function(x, y) if(is.na(x) || is.na(y)) NA else
sample(seq(x, y), 1), df[j,"start"], df[j,"end"])