I am following the recommendations of using rcorr plain correlation matrix using mtcars dataset using R. I would like to find the correlation for column; mpg to cyl, mpg to disp, mpg to hp and similarly for all other columns (multi sampling) for each of the cars listed as rownames. I understand it would create a large matrix of dataset but in my results for each of the correlation, I would like to know the rowname. My current code looks like this -
require(ggpubr)
require(tidyverse)
require(Hmisc)
require(corrplot)
data(mtcars)
flattenCorrMatrix <- function(cormat, pmat) {
ut <- upper.tri(cormat)
data.frame(
row = rownames(cormat)[row(cormat)[ut]],
column = rownames(cormat)[col(cormat)[ut]],
cor =(cormat)[ut],
p = pmat[ut]
)
}
tt <- mtcars
head(tt)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
dm = data.matrix(tt)
cc = rcorr(dm, type="pearson")
rcc = flattenCorrMatrix(cc$r, cc$P)
rc = data.frame(rcc)
head(rc)
The result is
head(rc)
row column cor p
mpg cyl -0.8522 0.000000000611269
mpg disp -0.8476 0.000000000938033
cyl disp 0.9020 0.000000000001803
mpg hp -0.7762 0.000000178783525
cyl hp 0.8324 0.000000003477861
disp hp 0.7909 0.000000071426787
However I would like to know what car to which a correlation occurred i.e. add a column to the above data frame "car model". In this case, the car model is the rowname from mtcars(above - tt).
Any help to resolve this is appreciated.
What you're asking isn't actually possible. That is because, each correlation listed above consist of data for multiple cars. For example, let's look at the first row:
row column cor p
mpg cyl -0.8522 0.000000000611269
This is a correlation between all values in the mpg
column in your dataset and all values in the cyl
column. so each row of your results is actually considering all cars in the mtcars
dataset.