Search code examples
rloopsdataframenested-loops

Run a pop-gen simulation several times and store the results of each in a new column on a data frame


I have a basic Wright-Fisher simulation for two alleles that works wonderfully, and produces a good looking plot showing the allele fixing or dying out by chance as expected. It exports every generation as calculated into a data frame d, so I have the values at each generation to hand. What I want to do is run the whole thing say 20 times and store each complete simulation in a new column automatically, so I can plot all of them on a ggplot graph with colours and all that good stuff. I'm mostly interested in getting a tidy frame to make good-looking plots for a project rather than breakneck efficiency.

#Wright Fisher model Mk1

#Simulation Parameters
# n = pop.size
# f = frequency of focal allele
# x = number of focal allele, do not set by hand
# y = number of the other allele, do not set by hand
# g = generations desired
n = 200
f = 0.6
x = (n*f)
y = (n-x)
g = 200

#This creates a data frame of the correct size to store each generation

d = data.frame(f = rep(0,g))

#Creates the graph.
plot(1,0, type = "n", xlim = c(1,200), ylim = c(0,n),
     xlab = "Generation", ylab = "Frequency A")

#Creates the population, this model is limited to only two alleles, and can only plot one
alleles<- c(rep("A",x), rep("a",y))

#this is the loop that actually simulates the population
#It has code for plotting each generation on the graph as a point 
#Exports the number of focal allele A to the data frame
for (i in 1:g){ 
  alleles <- sample(alleles, n, replace = TRUE)
points(i, length(alleles[alleles=="A"]), pch = 19, col= "red")
F = sum(alleles == "A")
d[i, ] = c(F)
}

So I want to run that last bit multiple times and store each complete iteration somehow. I know I could loop the function by nesting it, even though this is quick and dirty, but doing this only stores the values of the last iteration of the outer loop.


Solution

  • There's a lot of opportunity for improvement here, but this should get you going. I am only showing five simulations, but you should be able to extend. In essence, place the bulk of your code a function and then you can use either map functions from the purrr package or you could also do something with replicate:

    library(tidyverse)
    
    n = 200
    f = 0.6
    x = (n*f)
    y = (n-x)
    g = 200
    
    d = data.frame(f = rep(0,g))
    
    run_sim <- function() {
      alleles <- c(rep("A", x), rep("a", y))
    
      for (i in 1:g) { 
        alleles <- sample(alleles, n, replace = TRUE)
        cnt_A = sum(alleles == "A")
        d[i, ] = c(cnt_A)
      }
    
      return(d)
    }
    
    sims <- paste0("sim_", 1:5)
    
    set.seed(4) # for reproducibility
    
    sims %>%
      map_dfc(~ run_sim()) %>%
      set_names(sims) %>%
      gather(simulation, results) %>%
      group_by(simulation) %>%
      mutate(period = row_number()) %>%
      ggplot(., aes(x = period, y = results, group = simulation, color = simulation)) +
      geom_line()
    

    Created on 2019-03-21 by the reprex package (v0.2.1)

    NOTE: You could also add arguments to the run_sim function for say x and y (i.e., run_sim <- function(x, y) { ... }), which would allow you to explore other possibilities.