I have two huge dataframes with the same columns and row names but different values. Now i wanted to compute the correlation for each column between the two dataframes but only for the columns with the same name.
yyyymm `10000` `10001` `10002` `10003` `10004` `10005` `10006`
<int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 198601 NA NA NA NA NA NA NA
2 198602 NA NA NA NA NA NA NA
3 198603 NA NA NA NA NA NA NA
4 198604 NA NA NA NA NA NA NA
5 198605 NA NA NA NA NA NA NA
6 198606 NA NA NA NA NA NA NA
both datframes look like this.
Meaning i want the correlation of the column 10001
of the first dataframe with 10001
of the second dataframe and so on.
PS: the missing values are only in the first rows.
Does this work:
set.seed(111)
df1 <- data.frame(c1 = rnorm(10),
c2 = rnorm(10),
c3 = rnorm(10))
df2 <- data.frame(c1 = rnorm(10),
c3 = rnorm(10),
c2 = rnorm(10))
Map(cor, df1[sort(names(df1))], df2[sort(names(df2))])
$c1
[1] -0.02421313
$c2
[1] 0.2706937
$c3
[1] -0.1615181
OR:
unlist(Map(cor, df1[sort(names(df1))], df2[sort(names(df2))]))
c1 c2 c3
-0.02421313 0.27069371 -0.16151811
Using purrr:
library(purrr)
pmap(list(df1[sort(names(df1))], df2[sort(names(df2))]), cor)
$c1
[1] -0.02421313
$c2
[1] 0.2706937
$c3
[1] -0.1615181