Search code examples
rsampling

How to randomly sample within group with condition in R


I am running into a brick wall when I tried to generate a dataset. Here is my problem.

I am creating a dataset with this code

time0 <- rtruncnorm(100,a = 0, mean = 0.525, sd = 0.925 - 0.525)
Desc(time0)

time0.dat <- data.frame(time = "0", DV= time0)

time1 <- rnorm(100, mean = 10.65, sd = 13.025 - 10.65)
Desc(time1)
time1.dat <- data.frame(time = "1", DV= time1)

time2 <- rnorm(100, mean = 11.025, sd = 14.175 - 11.025)
Desc(time2)
time2.dat <- data.frame(time = "2", DV= time2)

time3 <- rnorm(100, mean = 5.95, sd = 8.175 - 5.95)
Desc(time3)
time3.dat <- data.frame(time = "3", DV= time3)

time4 <- rnorm(100, mean = 3.8, sd = 4.375 - 3.8)
Desc(time4)
time4.dat <- data.frame(time = "4", DV= time4)

time5 <-  rtruncnorm(100,a = 0, mean = 2.075, sd = 2.75 - 2.075)
Desc(time5)
time5.dat <- data.frame(time = "5", DV= time5)

time6 <- rtruncnorm(100,a = 0, mean = 1.225, sd = 1.625 - 1.225) 
Desc(time6)
time6.dat <- data.frame(time = "6", DV= time6)

time7 <- rtruncnorm(100,a = 0, mean = 0.725, sd = 1.05 - 0.725) 
Desc(time7)
time7.dat <- data.frame(time = "7", DV= time7)

time8 <- rtruncnorm(100,a = 0, mean = 0.275, sd = 0.575 - 0.275) 
Desc(time8)
time8.dat <- data.frame(time = "8", DV= time8)

ctdat <- rbind(time0.dat,time1.dat,time2.dat,time3.dat,time4.dat,time5.dat,time6.dat,time7.dat,time8.dat)

I want to randomly sample 1 from each "time" and with condition time at 2 > time 3, time 3> time 4 ... time 7 > time 8. And then with each sample time, I want to assign that with an ID.

So my desire dataset after sampling should be like this:

image

Thank you. Really appreciate!


Solution

  • How about something like this:

    out <- NULL
    nsamples <- 2
    untime <- unique(ctdat$time)
    j <- 1
    while(j <= nsamples){
      tmp <- NULL
      for(i in seq_along(untime)){
        x <- ctdat %>% filter(time == untime[i]) %>% sample_n(1)
        if(untime[i] > 2){
          f <- ctdat %>% filter(time == untime[i] & DV < tmp$DV[(i-1)])
          if(nrow(f) == 0)break
          x <- f %>% sample_n(1)
        }
        tmp <- rbind(tmp, x)
      }
      if(nrow(tmp) == length(untime)){
        tmp$ID <- j
        out <- rbind(out, tmp)
        j <- j+1
      }
    }
    out <- out %>% select(ID, time, DV)
    
    #    ID time         DV
    # 1   1    0  1.2771383
    # 2   1    1  6.5078257
    # 3   1    2 12.9128808
    # 4   1    3  4.1991406
    # 5   1    4  4.1933681
    # 6   1    5  2.7821423
    # 7   1    6  1.1044560
    # 8   1    7  0.9538192
    # 9   1    8  0.7632612
    # 10  2    0  1.1390608
    # 11  2    1  8.4283165
    # 12  2    2  8.4993436
    # 13  2    3  4.0232520
    # 14  2    4  3.4199055
    # 15  2    5  2.0761036
    # 16  2    6  1.1407558
    # 17  2    7  0.5703776
    # 18  2    8  0.1484522