I would like to sample values, but have a constraint in place that demands two values are at least window
apart. This would be akin to sampling days in a year, but setting the window
to be at least a fortnight apart. So far I've tried it like this
check.diff <- TRUE
window <- 14
while (check.diff == TRUE) {
sampled.session <- sort(sample(1:365, size = 5, replace = FALSE))
check.diff <- any(diff(sampled.session) < window)
}
This works nicely if the window
constraint is small. If one specifies a rather large value, this can become an infinite loop. While I can insert all sorts of checks and maximum number of iterations, I was wondering if there's a smarter way of attacking this?
One way to do this is by removing candidates from the population from which you take the sample:
set.seed(42)
population <- 1:356
n_samples <- 5
window <- 14
sampled_session <- rep(sample(population, 1), n_samples) # initialize the vector
for (i in seq.int(2, n_samples)) {
borders <- sampled_session[i - 1] + (window - 1) * c(-1, 1)
days_in_window <- seq.int(borders[1], borders[2])
population <- setdiff(population, days_in_window)
sampled_session[i] <- sample(population, 1)
}
sort(sampled_session)
# [1] 90 193 264 309 326
diff(sort(sampled_session))
# [1] 103 71 45 17
Another way would be
set.seed(357)
population <- 1:357
n_samples <- 5
window <- 14
sampled.session <- numeric(n_samples)
for (i in seq_len(n_samples)) {
sampled.session[i] <- pick <- sample(population, 1)
population <- population[-which(population < pick + window & population > pick - window)]
}
sort(sampled.session)
[1] 19 39 111 134 267