Search code examples
rloopsdplyraggregate-functions

Looping a loop to do everything at once


I am trying to simulate the following "game:

  • There is a population of 100 units
  • You randomly sample 10 of these units, record the id's of the units you saw, and then put them back into the population
  • You then take a second sample, record the id's of the units you saw in this second sample along with the first sample, and then put the second sample back into the population
  • Repeat this many times

I wrote the following code in R that performs the above procedure:

library(dplyr)

var_1 = rnorm(100,10,10)
var_2 = rnorm(100,1,10)
var_3 = rnorm(100,5,10)
response = rnorm(100,1,1)

my_data = data.frame(var_1, var_2, var_3, response)
my_data$id = 1:100


results <- list()
results2<- list()

for (i in 1:100)
    
{
    
    iteration_i = i
    
    sample_i = my_data[sample(nrow(my_data), 10), ]
    
    
    results_tmp = data.frame(iteration_i, sample_i)
    
    results[[i]] <- results_tmp
    
}

results_df <- do.call(rbind.data.frame, results)

test_1 <- data.frame(results_df %>% 
    group_by(id) %>% 
    filter(iteration_i == min(iteration_i)) %>% 
    distinct)


summary_file = data.frame(test_1 %>% group_by(iteration_i) %>% summarise(Count = n()))

cumulative = cumsum(summary_file$Count)

summary_file$Cumulative = cumulative

summary_file$unobserved = 100 - cumulative

The result looks something like this:

> summary_file
   iteration_i Count Cumulative unobserved
1            1    10         10         90
2            2     8         18         82
3            3     9         27         73
4            4     8         35         65
5            5     6         41         59
6            6     5         46         54
7            7     7         53         47
8            8     7         60         40
9            9     4         64         36
10          10     3         67         33
11          11     4         71         29
12          12     4         75         25
13          13     1         76         24
14          14     4         80         20
15          15     1         81         19
16          16     2         83         17
17          17     2         85         15
18          18     1         86         14
19          20     1         87         13
20          22     1         88         12
21          23     2         90         10
22          24     1         91          9
23          25     1         92          8
24          27     2         94          6
25          28     1         95          5
26          30     1         96          4
27          35     1         97          3
28          37     1         98          2
29          44     1         99          1
30          46     1        100          0

I would now like to repeat this "game" many times.

  • I would like to keep the "summary_file" for each "game" (e.g. summary_file_1, summary_file_2, summary_file_3, etc.)

  • I would then like to create a "total" summary file that shows the number of iterations that were required in each game to observe all units.

This total_summary_file would look something like this:

 game_id iterations_required
1  game_1                  47
2  game_2                  45
3  game_3                  44
4  game_4                  42
5  game_5                  42

Currently, I am just copy/pasting my earlier code several times and storing the results, then I append everything at the end and calculate the summary statistics - but I am trying to find a way to "loop the loop" and do everything at once. I do not know if it is possible to introduce a command like "results_df_i <- do.call(rbind.data.frame, results_i)" into the loop and efficiently create everything at the same time instead of manually copy/pasting the earlier loop.


Solution

  • OP here! I think I was able to find an answer to my own question:

    library(dplyr)
    
    var_1 <- rnorm(100, 10, 10)
    var_2 <- rnorm(100, 1, 10)
    var_3 <- rnorm(100, 5, 10)
    response <- rnorm(100, 1, 1)
    my_data <- data.frame(var_1, var_2, var_3, response)
    my_data$id <- 1:100
    
    simulate <- function() {
      results <- list()
      results2 <- list()
      for (i in 1:100) {
        iteration_i <- i
        sample_i <- my_data[sample(nrow(my_data), 10), ]
        results_tmp <- data.frame(iteration_i, sample_i)
        results[[i]] <- results_tmp
      }
      results_df <- do.call(rbind.data.frame, results)
      test_1 <- data.frame(results_df %>% 
                             group_by(id) %>% 
                             filter(iteration_i == min(iteration_i)) %>% 
                             distinct)
      summary_file <- data.frame(test_1 %>% 
                                   group_by(iteration_i) %>% 
                                   summarise(Count=n()))
      cumulative <- cumsum(summary_file$Count)
      summary_file$Cumulative <- cumulative
      summary_file$unobserved <- 100 - cumulative
      return(summary_file)
    }
    
    # now, loop 10 times!
    
    results <- list()
    for (i in 1:10) { 
      game_i <- i
      s_i <- simulate()
      results_tmp <- data.frame(game_i, s_i)
      results[[i]] <- results_tmp
    }
    
    final_file <- do.call(rbind.data.frame, results)
    

    Thanks for your help everyone!