Search code examples
rdataframecorrelation

Automatically compute and store a lot of variables in a dataframe


I have a list called res.Sigma containing 100 covariance matrices of order 4*4, where each matrix corresponds to a fixed time. The values of the time variable are stored in a variable called X.

Now, we can compute the time-varying sample variances and store them in a dataframe as follows:

sample.var <- data.frame(X = X,
                         v11 = res.Sigma %>% lapply(function(val) val[1,1]) %>% unlist(),
                         v22 = res.Sigma %>% lapply(function(val) val[2,2]) %>% unlist(),
                         v33 = res.Sigma %>% lapply(function(val) val[3,3]) %>% unlist(),
                         v44 = res.Sigma %>% lapply(function(val) val[4,4]) %>% unlist())

Similarly, the time-varying sample correlations can be computed and stored as follows:

sample.corr <- data.frame(X = X,
                          r12 = res.Sigma %>% lapply(function(val) 
                                      val[1,2]/sqrt(val[1,1]*val[2,2])) %>% unlist(),
                          r13 = res.Sigma %>% lapply(function(val) 
                                      val[1,3]/sqrt(val[1,1]*val[3,3])) %>% unlist(),
                          r14 = res.Sigma %>% lapply(function(val) 
                                      val[1,4]/sqrt(val[1,1]*val[4,4])) %>% unlist(),
                          r23 = res.Sigma %>% lapply(function(val) 
                                      val[2,3]/sqrt(val[2,2]*val[3,3])) %>% unlist(),
                          .
                          .
                          .
                          )

There will be 4C2 = 6 columns for sample correlations between each pair of variables. However, if there were 20 variables instead of 4, there would be 20C2 = 190 such columns which will be impossible to write manually as I have written above.

I'm looking for a way to do this automatically, and name the columns appropriately (r_{i,j} for the column containing correlations between the i-th and the j-th variables).

Any help will be appreciated. Let me know if my question is unclear.


Solution

  • Here is a tidyverse example where the columns are generated dynamically. First, we define two functions which will generate the columns and then we apply them to the data frame.

    library(tidyverse)
    
    # create the mutate functions
    create_var_col <- function(df, n, mat) {
        mutate(df, "v{n}{n}" := map(mat, function(x)
            x[n, n]))
    }
    
    create_corr_col <- function(df, j, k, mat) {
        mutate(df, "r{j}{k}" := map(mat, function(x)
            x[j, k] / (x[j, j] * x[k, k])))
    }
    

    Application:

    # initalize res.Sigma and X
    nrow <- 4
    
    res.Sigma <- lapply(seq_len(100), function(x)
        matrix(runif(16), nrow = nrow))
    
    X <- 1:100
    
    # initialize output data frames
    sample.var <- data.frame(
        X = X
    )
    
    sample.corr <- data.frame(
        X = X
    )
    
    # populate the data frames with new columns
    for (n in 1:nrow) {
        sample.var <- create_var_col(sample.var, n, res.Sigma)
    }
    
    combs <- combn(nrow, 2)
    
    for (i in (seq_len(ncol(combs)))) {
    
        j <- combs[1, i]
        k <- combs[2, i]
        
        sample.corr <- create_corr_col(sample.corr, j, k, res.Sigma)
    }
    

    Results:

    head(sample.var)
    
    > head(sample.var)
      X       v11       v22          v33        v44
    1 1 0.3770457 0.1360691 0.8865960506 0.22080969
    2 2 0.7450246 0.6379828 0.0007324344 0.15418729
    3 3 0.4506430 0.5700204 0.7727197134 0.76241635
    4 4 0.3577171 0.7811212 0.2732931704 0.12555681
    5 5 0.9532550 0.9781340 0.2275580675 0.03497771
    6 6 0.0226325 0.3912557 0.4199170584 0.26492276
    
    > head(sample.corr)
      X         r12        r13        r14         r23        r24          r34
    1 1  17.6002993   1.923644  0.8184414   2.3721132 21.9337539    0.6569090
    2 2   0.6498641 817.358339  4.9511748 299.1091543  0.6651190 4276.9960394
    3 3   0.6825716   2.117533  0.8900186   1.9256916  1.9253819    0.6588873
    4 4   0.8280786   8.324110  1.6634117   1.7292685  2.0207334    9.9122315
    5 5   0.9520840   3.642721  3.5012283   2.2722650  0.2582217  116.3901019
    6 6 100.8714017  99.157062 97.8675581   0.5384571  5.6053657    2.5930488