Search code examples
rdataframerandomr-sfsampling

Select random rows until threshold value from other column is reached


I have an SF-object in R. It looks as follows:

Type   Value   Geometry 
A       1        ()
A       3        ()
B       2        ()
A       1        ()
C       4        ()

In the geometry column, the geometry of the polygon feature is stored. I want to sample random rows, until a threshold-value (let's say 5) as the sum of the values in the Value column is reached or exceeded.

If in the example above, Rows 1, 4, and 5 are sampled, the sampling stops.


Solution

  • You can use a while loop to check the sum on each iteration:

    library(tidyverse)
    
    df <- tibble(type = c('a', 'a', 'b', 'a', 'c'), value = c(1, 3, 2, 1, 4))
    samples <- tibble()
    sample_sum <- 0
    
    while (sample_sum < 5) {
      ix <- sample(1:nrow(df), size = 1, replace = TRUE)
      samples <- bind_rows(samples, slice(df, ix))
      sample_sum <- sum(samples$value)
    }