Search code examples
rdataframesapply

Correlate only columns with same name


I have two huge dataframes with the same columns and row names but different values. Now i wanted to compute the correlation for each column between the two dataframes but only for the columns with the same name.

  yyyymm `10000` `10001` `10002` `10003` `10004` `10005` `10006`
   <int>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
1 198601      NA      NA      NA      NA      NA      NA      NA
2 198602      NA      NA      NA      NA      NA      NA      NA
3 198603      NA      NA      NA      NA      NA      NA      NA
4 198604      NA      NA      NA      NA      NA      NA      NA
5 198605      NA      NA      NA      NA      NA      NA      NA
6 198606      NA      NA      NA      NA      NA      NA      NA

both datframes look like this.

Meaning i want the correlation of the column 10001 of the first dataframe with 10001 of the second dataframe and so on.

PS: the missing values are only in the first rows.


Solution

  • Does this work:

    set.seed(111)
    df1 <- data.frame(c1 = rnorm(10),
                       c2 = rnorm(10),
                       c3 = rnorm(10))
     df2 <- data.frame(c1 = rnorm(10),
                      c3 = rnorm(10),
                      c2 = rnorm(10))
    Map(cor, df1[sort(names(df1))], df2[sort(names(df2))])
    $c1
    [1] -0.02421313
    
    $c2
    [1] 0.2706937
    
    $c3
    [1] -0.1615181
    

    OR:

    unlist(Map(cor, df1[sort(names(df1))], df2[sort(names(df2))]))
             c1          c2          c3 
    -0.02421313  0.27069371 -0.16151811 
    

    Using purrr:

    library(purrr)
    pmap(list(df1[sort(names(df1))], df2[sort(names(df2))]), cor)
    $c1
    [1] -0.02421313
    
    $c2
    [1] 0.2706937
    
    $c3
    [1] -0.1615181