Search code examples
rsampling

sample with a minimal difference between two consecutive values


I would like to sample values, but have a constraint in place that demands two values are at least window apart. This would be akin to sampling days in a year, but setting the window to be at least a fortnight apart. So far I've tried it like this

check.diff <- TRUE
window <- 14
while (check.diff == TRUE) {
    sampled.session <- sort(sample(1:365, size = 5, replace = FALSE))
    check.diff <- any(diff(sampled.session) < window)
}

This works nicely if the window constraint is small. If one specifies a rather large value, this can become an infinite loop. While I can insert all sorts of checks and maximum number of iterations, I was wondering if there's a smarter way of attacking this?


Solution

  • One way to do this is by removing candidates from the population from which you take the sample:

    set.seed(42)
    
    population <- 1:356
    n_samples <- 5
    window <- 14
    
    sampled_session <- rep(sample(population, 1), n_samples) # initialize the vector
    
    for (i in seq.int(2, n_samples)) {
        borders <- sampled_session[i - 1] + (window - 1) * c(-1, 1)
        days_in_window <- seq.int(borders[1], borders[2])
        population <- setdiff(population, days_in_window)
        sampled_session[i] <- sample(population, 1) 
    }
    
    sort(sampled_session)
    # [1]  90 193 264 309 326
    
    diff(sort(sampled_session))
    # [1] 103  71  45  17
    

    Another way would be

    set.seed(357)
    population <- 1:357
    n_samples <- 5
    window <- 14
    
    sampled.session <- numeric(n_samples) 
    for (i in seq_len(n_samples)) {
        sampled.session[i] <- pick <- sample(population, 1)
        population <- population[-which(population < pick + window & population > pick - window)]
    }
    sort(sampled.session)
    [1]  19  39 111 134 267