I trying to find a way to do a nested for loop in r to get every possible correlation combination of this:
cor(y, column1 * column2)
,
cor(y, column1 * column3)
,
cor(y, column1 * column4)
,
cor(y, column2 * column3)
or in my example:
cor(MP, FG_pct * FGA)
,
cor(MP, FG_pct * FT)
,
cor(MP, FG_pct * FT_pct)
and so on
This is what I have tried so far:
for(i in 1:length(dataframe))
{
for(j in 1:length(dataframe))
{
joint_correlation(i,j)=cor(MP, dataframe(i) * dataframe(j));
}
}
My dataframe has 115 columns like shown with a small sample:
FG_pct FGA FT FT_pct FTA GP GS GmSc MP ORB
0.625 8 0 0.00 0 1 0 6.6 28.4 2
0.500 4 0 0.00 1 2 0 2.1 17.5 0
0.000 1 0 0.00 0 3 0 1.2 6.6 1
0.500 6 0 0.00 0 4 0 3.6 13.7 1
0.500 2 0 0.00 0 5 0 0.9 7.4 1
I want to find the correlation for cor(MP, column1 * column2)
for every possible combination switched out for column1 and column2. This way, I wouldn't have to do every single one of them separately. I believe a loop going through all of the scenarios is the best way. If possible, I would like to save the output for each correlation combination cor(MP, FG_pct * FGA)
, cor(MP, FG_pct * FT_pct)
, cor(MP, GmSc * ORB)
, etc. in a separate column.
EDIT
sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Catalina 10.15.4
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
Random number generation:
RNG: Mersenne-Twister
Normal: Inversion
Sample: Rounding
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] dplyr_0.8.5 magrittr_1.5 ggplot2_3.3.0 corrr_0.4.2 RColorBrewer_1.1-2
[6] readr_1.3.1 corrplot_0.84
loaded via a namespace (and not attached):
[1] Rcpp_1.0.4 rstudioapi_0.11 knitr_1.24 MASS_7.3-51.5 hms_0.5.3 tidyselect_1.0.0
[7] munsell_0.5.0 colorspace_1.4-1 R6_2.4.1 rlang_0.4.5 tools_3.6.1 grid_3.6.1
[13] gtable_0.3.0 xfun_0.9 withr_2.1.2 assertthat_0.2.1 tibble_2.1.3 lifecycle_0.2.0
[19] crayon_1.3.4 farver_2.0.3 purrr_0.3.3 vctrs_0.2.4 glue_1.3.2 compiler_3.6.1
[25] pillar_1.4.3 scales_1.1.0 pkgconfig_2.0.3'
Assuming you want the correlations of every column multiplied by combinations of two of the remaining columns.
We can find the names of according combinations using combn(names(dat), 2)
which we put into an lapply
.
combs <- do.call(cbind.data.frame,
lapply("MP", rbind, combn(names(dat)[names(dat) != "MP"], 2)))
combs
# 1 2 3
# 1 MP MP MP
# 2 FG_pct FG_pct FGA
# 3 FGA FT FT
In another lapply
we subset the data on the name-combinations and calculate cor
with formula cor(x1 ~ x2 * x3). Simultaneously we store the names paste
d as formula in an attr
ibute, to remember later what we've calculated in each iteration.
res.l <- lapply(combs, function(x) {
`attr<-`(cor(dat[,x[1]], dat[,x[2]]*dat[,x[3]]),
"what", {
paste0(x[1], ", ", paste(x[2], "*", x[3]))})
})
Finally we unlist
and setNames
according to the attributes.
res <- setNames(unlist(res.l), sapply(res.l, attr, "what"))
# MP, FG_pct * FGA MP, FG_pct * FT MP, FGA * FT
# 0.2121374 0.2829003 0.4737892
Check:
(Note, that you can directly put the names, e.g. MP, FG_pct * FGA
into the cor
function.)
with(dat, cor(MP, FG_pct * FGA))
# [1] 0.2121374
with(dat, cor(MP, FG_pct * FT))
# [1] 0.2829003
with(dat, cor(MP, FGA * FT))
# [1] 0.4737892
To sort, use e.g. sort(res)
or rev(sort(res))
.
Toy data:
set.seed(42)
dat <- as.data.frame(`colnames<-`(MASS::mvrnorm(n=1e4,
mu=c(0.425, 4.2, 0.2, 3),
Sigma=matrix(c(1, .3, .7, 0,
.3, 1, .5, 0,
.7, .5, 1, 0,
0, 0, 0, 1), nrow=4),
empirical=T), c("FG_pct", "MP", "FGA", "FT")))