Search code examples
rsimulationtibblereplicate

Wrangling results from simulations


I'm pretty new to R, and I'm trying to learn how to do some simulations. Currently I have a program that does the following:

  1. In one function, uses a DGP to create fake data, which is returned as a tibble
  2. In another function, randomly assign fake observations to treatment
  3. In the final function, merge random assignment results with fake data and run regression. I return a list that includes the estimate and p-value using the below code
    tauhat <- trobust %>% filter(term=="TTRUE") %>% pull(estimate)
    pvalue <- trobust %>% filter(term=="TTRUE") %>% pull(p.value)
    return(list(tauhat,pvalue))

If I run these functions once, I get something like the below

> finitepop(finiteN=20)
[[1]]
[1] 0.3730686

[[2]]
[1] 0.03445962

I then use replicate to repeat that process say 100 times. I end up with a 2X100 thing - perhaps it's an array? - and I'd like to turn that into a 100X2 tibble, that is, a tibble with columns for the estimate and p-value and simulation results stored as observations. The results from the sim look like

> finitesim <- (replicate(n=reps,finitepop(finiteN=20)))
> finitesim
     [,1]        [,2]      [,3]      [,4]       [,5]      [,6]      [,7]     
[1,] -0.03096849 0.206797  0.2386698 0.09374408 0.1462773 0.2479394 0.2177207
[2,] 0.8850678   0.2622687 0.2105784 0.5990369  0.3279901 0.1063231 0.2489028
     [,8]      [,9]       [,10]       [,11]     [,12]      [,13]     
[1,] 0.1661424 0.00977172 -0.08761129 0.1170922 -0.1559203 0.278062  
[2,] 0.2086819 0.9390261  0.6071284   0.472165  0.4214389  0.05973561

How should I convert the results to a nice tibble?

EDIT: Below is a MWE, where for convenience I changed the right hand side variable to x, and I didn't create the clustering structure for lm_robust

library(tidyverse)
library(lmerTest)                       #for lmer
library(merTools)                       #for lmer
library(estimatr)                       #for cluster robust se


finitepop <- function(finiteN){
    
    fakedata <- tibble(
        id=1:finiteN,
        x=rnorm(n=finiteN),
        y=rnorm(n=finiteN)
        )
    robust <- lm_robust(data=fakedata,y~x,cluster=id)
    trobust <- tidy(robust)
    tauhat <- trobust %>% filter(term=="x") %>% pull(estimate)
    pvalue <- trobust %>% filter(term=="x") %>% pull(p.value)
    return(list(tauhat,pvalue))
    }


finitesim <- (replicate(n=10,finitepop(finiteN=20),simplify=FALSE))

finitesim

Solution

  • Use can use map_df from the purrr package (part of tidyverse):

    finitepop <- function(finiteN){
      
      fakedata <- tibble(
        id=1:finiteN,
        x=rnorm(n=finiteN),
        y=rnorm(n=finiteN)
      )
      robust <- lm_robust(data=fakedata,y~x,cluster=id)
      trobust <- tidy(robust)
      tauhat <- trobust %>% filter(term=="x") %>% pull(estimate)
      pvalue <- trobust %>% filter(term=="x") %>% pull(p.value)
      print(pvalue)
      return(tibble(tauhat,pvalue))
    }
    
    finitesim <- replicate(n=10,finitepop(finiteN=20),simplify=FALSE) %>%
      purrr::map_df(as.data.frame)
    
    > finitesim
             tauhat     pvalue
    1   0.035057186 0.89818890
    2  -0.248569087 0.24159959
    3   0.111054217 0.75700470
    4   0.596779950 0.00223398
    5  -0.004052686 0.98418837
    6  -0.105390590 0.67410417
    7  -0.107913504 0.54778478
    8  -0.021681712 0.89834059
    9  -0.161811559 0.49091499
    10  0.241477999 0.21281508