Search code examples
rdplyrmass

Complex function that iterates mvrnorm over rows of a data frame


I have data that looks like this:

data = data.frame(a.coef = c(.14, .15, .16), 
                  b.coef = c(.4, .5, .6), 
                  a.var = c(0.0937, 0.0934, 0.0945), 
                  b.var = c(0.00453, 0.00564, 0.00624), 
                  ab.cov = c(0.000747, 0.000747, 0.000747))

and I would like to run the following code (source: http://www.quantpsy.org/medmc/medmc.htm) on each row of the data set.

require(MASS)
a = data$a.coef
b = data$b.coef
rep = 10000
conf = 95
pest = c(a, b)
acov <- matrix(c(data$a.var, data$ab.cov, 
                 data$ab.cov, data$b.var), 2, 2)
mcmc <- mvrnorm(rep, pest, acov, empirical = FALSE)
ab <- mcmc[ , 1] * mcmc[ , 2]
low = (1 - conf / 100) / 2
upp = ((1 - conf / 100) / 2) + (conf / 100)
LL = quantile(ab, low)
UL = quantile(ab, upp)
LL4 = format(LL, digits = 4)
UL4 = format(UL, digits = 4)

I've created a relatively simple function that takes the data and the row number as inputs:

MCMAM <- function(data_input, row_number) {
  data = data_input[row_number, ]
  a = data[["a.coef"]]
  b = data[["b.coef"]]
  rep = 10000
  conf = 95
  pest = c(a, b)
  acov <- matrix(c(data[["a.var"]], data[["ab.cov"]], 
                   data[["ab.cov"]], data[["b.var"]]), 2, 2)
  require(MASS)
  mcmc <- mvrnorm(rep, pest, acov, empirical = FALSE)
  ab <- mcmc[, 1] * mcmc[, 2]
  low = (1 - conf / 100) / 2
  upp = ((1 - conf / 100) / 2) + (conf / 100)
  LL = quantile(ab, low)
  UL = quantile(ab, upp)
  return(c(LL, UL))
}

MCMAM(data, 1)

    2.5%      97.5% 
-0.1901272  0.3104614 

But it would be great if there was a way to get rid of the row specification and just have the function run through the data set row by row and save the output to a new column in the data set.

I've been experimenting with for loops and apply functions but haven't had any success, largely because both the matrix() and mvrnorm() functions take values rather than vectors.


Solution

  • We can use lapply

    do.call(rbind, lapply(seq_len(nrow(data)), MCMAM, data_input = data))
    

    -ouptut

        2.5%     97.5%
    [1,] -0.1832449 0.3098362
    [2,] -0.2260856 0.3856575
    [3,] -0.2521126 0.4666583
    

    Or use rowwise

    library(dplyr)
    library(tidyr)
    data %>% 
         rowwise %>% 
         mutate(new = list(MCMAM(cur_data(), 1))) %>%
         unnest_wider(new)
    # A tibble: 3 x 7
    #  a.coef b.coef  a.var   b.var   ab.cov `2.5%` `97.5%`
    #   <dbl>  <dbl>  <dbl>   <dbl>    <dbl>  <dbl>   <dbl>
    #1   0.14    0.4 0.0937 0.00453 0.000747 -0.185   0.309
    #2   0.15    0.5 0.0934 0.00564 0.000747 -0.219   0.396
    #3   0.16    0.6 0.0945 0.00624 0.000747 -0.259   0.472