R - Sample consecutive series of dates in time series without replacement?

I have a data frame in R containing a series of dates. The earliest date is (ISO format) 2015-03-22 and the latest date is 2016-01-03, but there are two breaks within the data. Here is what it looks like:

library(tidyverse)
library(lubridate)

date_data <- tibble(dates = c(seq(ymd("2015-03-22"),
                                  ymd("2015-07-03"),
                                  by = "days"),
                              seq(ymd("2015-08-09"),
                                  ymd("2015-10-01"),
                                  by = "days"),
                              seq(ymd("2015-11-12"),
                                  ymd("2016-01-03"),
                                  by = "days")),
                    sample_id = 0L)

I.e.:

> date_data
# A tibble: 211 x 2
   dates      sample_id
   <date>         <int>
 1 2015-03-22         0
 2 2015-03-23         0
 3 2015-03-24         0
 4 2015-03-25         0
 5 2015-03-26         0
 6 2015-03-27         0
 7 2015-03-28         0
 8 2015-03-29         0
 9 2015-03-30         0
10 2015-03-31         0
# … with 201 more rows

What I want to do is to take ten 10-day long samples of continous dates from within that time series without replacement. For example, a valid sample would be the ten days from 2015-04-01 to 2015-04-10 because that falls completely within the dates column in my date_data data frame. Each sample would then get a unique (non-zero) number in the sample_id column in date_data such as 1:10.

To be clear, my requirements are:

Each sample would be 10 consecutive days.
The sampling has to be without replacement. So if sample_id == 1 is the 2015-04-01 to 2015-04-10 period, those dates can't be part of another 10-day-long sample.
Each 10-day-long sample can't include any date that's not within date_data$dates.

At the end, date_data$sample_id would have unique numbers representing each 10-day-long sample, likely with lots of 0s left over that were not part of any sample (and there would be 200 rows - 10 for each sample - where sample_id != 0).

I am aware of dplyr::sample_n() but it doesn't sample consecutive values, and I don't know how to devise a way to "remember" which dates have already been sampled...

What's a good way to do this? A for loop?!?! Or perhaps something with purrr? Thank you very much for your help.

UPDATE: Thanks to @gfgm's solution, it reminded me that performance is an important consideration. My real dataset is quite a bit larger, and in some cases I would want to take 20+ samples instead of just 10. Ideally the size of the sample can be changed as well, i.e. not necessarily 10-days long.

Solution

This is tricky, as you anticipated, because of the requirement of sampling without replacement. I have a working solution below which achieves a random sample and works fast on a problem of the scale given in your toy example. It should also be fine with more observations, but will get really really slow if you need to pick a lot of points relative to the sample size.

The basic premise is to pick n=10 points, generate the 10 vectors from these points forwards, and if the vectors overlap ditch them and pick again. This is simple and works fine given that 10*n << nrow(df). If you wanted to get 15 subvectors out of your 200 observations this would be a good deal slower.

library(tidyverse)
library(lubridate)

date_data <- tibble(dates = c(seq(ymd("2015-03-22"),
                                  ymd("2015-07-03"),
                                  by = "days"),
                              seq(ymd("2015-08-09"),
                                  ymd("2015-10-01"),
                                  by = "days"),
                              seq(ymd("2015-11-12"),
                                  ymd("2016-01-03"),
                                  by = "days")),
                    sample_id = 0L)

# A function that picks n indices, projects them forward 10,
# and if any of the segments overlap resamples
pick_n_vec <- function(df, n = 10, out = 10) {
  points <- sample(nrow(df) - (out - 1), n, replace = F)
  vecs <- lapply(points, function(i){i:(i+(out - 1))})

  while (max(table(unlist(vecs))) > 1) {
    points <- sample(nrow(df) - (out - 1), n, replace = F)
    vecs <- lapply(points, function(i){i:(i+(out - 1))})
  }

  vecs
 }

# demonstrate
set.seed(42)
indices <- pick_n_vec(date_data)

for (i in 1:10) {
  date_data$sample_id[indices[[i]]] <- i
}

date_data[indices[[1]], ]
#> # A tibble: 10 x 2
#>         dates sample_id
#>        <date>     <int>
#>  1 2015-05-31         1
#>  2 2015-06-01         1
#>  3 2015-06-02         1
#>  4 2015-06-03         1
#>  5 2015-06-04         1
#>  6 2015-06-05         1
#>  7 2015-06-06         1
#>  8 2015-06-07         1
#>  9 2015-06-08         1
#> 10 2015-06-09         1
table(date_data$sample_id)
#> 
#>   0   1   2   3   4   5   6   7   8   9  10 
#> 111  10  10  10  10  10  10  10  10  10  10

^{Created on 2019-01-16 by the reprex package (v0.2.1)}

marginally faster version

pick_n_vec2 <- function(df, n = 10, out = 10) {
  points <- sample(nrow(df) - (out - 1), n, replace = F)
  while (min(diff(sort(points))) < 10) {
    points <- sample(nrow(df) - (out - 1), n, replace = F)
  }
  lapply(points, function(i){i:(i+(out - 1))})
}