Search code examples
rrandomtidyversesimulationtibble

tidyverse: Simulating random sample with nested factor


I want to simulate random sample with nested factor. Factor Dept has two levels A & B. Level A has two nested levels A1 and A2. Level B has three nested levels B1, B2 and B3. Want to simulate random sample from 2022-01-01 to 2022-01-31 using some R code. Part of desired output is given below (from 2022-01-01 to 2022-01-02 only for reference).

library(tibble)

set.seed(12345)
df1 <-
  tibble(
    Date   = c(rep("2022-01-01", 5), rep("2022-01-02", 4), rep("2022-01-03", 4))
  , Dept   = c("A", "A", "B", "B", "B", "A", "B", "B", "B", "A", "A", "B", "B")
  , Prog   = c("A1", "A2", "B1", "B2", "B3", "A1", "B1", "B2", "B3", "A1", "A2", "B2", "B3")
  , Amount = runif(n = 13, min = 50000, max = 100000) 
  )

df1
#> # A tibble: 13 x 4
#>    Date       Dept  Prog  Amount
#>    <chr>      <chr> <chr>  <dbl>
#>  1 2022-01-01 A     A1    86045.
#>  2 2022-01-01 A     A2    93789.
#>  3 2022-01-01 B     B1    88049.
#>  4 2022-01-01 B     B2    94306.
#>  5 2022-01-01 B     B3    72824.
#>  6 2022-01-02 A     A1    58319.
#>  7 2022-01-02 B     B1    66255.
#>  8 2022-01-02 B     B2    75461.
#>  9 2022-01-02 B     B3    86385.
#> 10 2022-01-03 A     A1    99487.
#> 11 2022-01-03 A     A2    51727.
#> 12 2022-01-03 B     B2    57619.
#> 13 2022-01-03 B     B3    86784.

Solution

  • If we want to sample randomly, create the expanded data with crossing and then filter/slice to return random rows for each 'date'

    library(dplyr)
    library(tidyr)
    library(stringr)
    crossing(Date = seq(as.Date("2022-01-01"), as.Date("2022-01-31"), 
       by = "1 day"), Dept = c("A", "B"), Prog = 1:3) %>%
       mutate(Prog = str_c(Dept, Prog)) %>%
      filter(Prog != "A3") %>% 
      mutate(Amount = runif(n = n(), min = 50000, max = 100000)) %>% 
      group_by(Date) %>% 
      slice(seq_len(sample(row_number(), 1)))  %>%
      ungroup
    

    -output

    # A tibble: 102 × 4
       Date       Dept  Prog  Amount
       <date>     <chr> <chr>  <dbl>
     1 2022-01-01 A     A1    83964.
     2 2022-01-01 A     A2    93428.
     3 2022-01-01 B     B1    85187.
     4 2022-01-01 B     B2    79144.
     5 2022-01-01 B     B3    65784.
     6 2022-01-02 A     A1    86014.
     7 2022-01-03 A     A1    76060.
     8 2022-01-03 A     A2    56412.
     9 2022-01-03 B     B1    87365.
    10 2022-01-03 B     B2    66169.
    # … with 92 more rows