This is something that has come up a couple of times in the past year or-so for me and each time causes a minor amount of headache.
Essentially, I have a few times wanted to make a correlation plot using the corrplot
package but have hit the issue that the data I have will be a single column of correlations, rather than a correlation matrix. What I need then is a top-to-bottom way of converting the data from a table or data frame into a correlation matrix.
I have managed to get MOST of the way to a solution, but haven't properly been able to solve it all yet.
I start by doing the following:
filtered_rgMat = melt(df[,c("cohort1", "cohort2", "rg")])
### For reproducibility purposes:
dput(filtered_rgMat)
structure(list(cohort1 = c("AGDS", "AGDS", "AGDS", "AGDS", "AGDS",
"AGDS", "AOU", "AOU", "AOU", "AOU", "AOU", "AOU", "FINNGEN",
"FINNGEN", "FINNGEN", "FINNGEN", "FINNGEN", "FINNGEN", "GBP",
"GBP", "GBP", "GBP", "GBP", "GBP", "GS20K", "GS20K", "GS20K",
"GS20K", "GS20K", "GS20K", "MGBB", "MGBB", "MGBB", "MGBB", "MGBB",
"MGBB", "UKB", "UKB", "UKB", "UKB", "UKB", "UKB"), cohort2 = c("AOU",
"FINNGEN", "GBP", "GS20K", "MGBB", "UKB", "AGDS", "FINNGEN",
"GBP", "GS20K", "MGBB", "UKB", "AGDS", "AOU", "GBP", "GS20K",
"MGBB", "UKB", "AGDS", "AOU", "FINNGEN", "GS20K", "MGBB", "UKB",
"AGDS", "AOU", "FINNGEN", "GBP", "MGBB", "UKB", "AGDS", "AOU",
"FINNGEN", "GBP", "GS20K", "UKB", "AGDS", "AOU", "FINNGEN", "GBP",
"GS20K", "MGBB"), variable = structure(c(1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L), levels = "rg", class = "factor"), value = c(1.147,
1.133, 0.8881, 0.6743, 0.7062, 0.866, 1.147, 2.81, 0.6095, 0.8929,
0.4563, 1.796, 1.133, 2.81, 2.388, 1.492, 1.703, 0.7535, 0.8881,
0.6095, 2.388, 0.8204, 0.2822, 1.641, 0.6743, 0.8929, 1.492,
0.8204, 0.5716, 1.152, 0.7062, 0.4563, 1.703, 0.2822, 0.5716,
1.288, 0.866, 1.796, 0.7535, 1.641, 1.152, 1.288)), row.names = c(NA,
-42L), class = "data.frame")
So, this is an example of the data I have to work with and I then do the following:
### Convert data frame into matrix
filtered_rgMat = cast(filtered_rgMat, cohort1 ~ cohort2 )
### Fix row names
rownames(filtered_rgMat) = filtered_rgMat[,1]
filtered_rgMat = filtered_rgMat[,-1]
### Convert the centre of the matrix to have a correlation of 1 (I.e. where the column == the row)
for (i in 1:ncol(filtered_rgMat)) {
filtered_rgMat[i, i] = 1
}
### Superfluous for the example data but I also convert any other NAs into 0s
for (i in 1:nrow(filtered_rgMat)) {
is.na(filtered_rgMat[i,]) = 0
}
The issue then is that after all of this, despite having what seems to at least look like a correlation matrix, corrplot still isn't happy with it and throws the following error:
corrplot(filtered_rgMat)
Error in corrplot(filtered_rgMat) : The matrix is not in [-1, 1]!`.
You can try and fix this by specifically telling corrplot it is not a correlation matrix, but this also doesn't work:
corrplot(filtered_rgMat, is.corr = FALSE)
Error in is.finite(tmp) : default method not implemented for type 'list'
Here, the only other question answering this I could find on Stackoverflow basically said to do cor(df)
which absolutely does not produce the right table here as it changes all the values.
Does anyone have any solutions of where I can go from here?
You can reshape your data into wide format using pivot_wider
on the cohort1
column, ensure your columns and rows are in the correct order using arrange
, replace all the NA
values with 0 using mutate
, drop the cohort2
column, convert to a data frame, rename the rows, convert to a matrix and then pass to corrplot
, specifying is.corr = FALSE
:
library(tidyverse)
filtered_rgMat %>%
select(-variable) %>%
pivot_wider(names_from = cohort1, values_from = value) %>%
arrange(cohort2) %>%
select(-cohort2) %>%
mutate(across(everything(), ~ifelse(is.na(.x), 0, .x))) %>%
as.data.frame() %>%
`row.names<-`(names(.)) %>%
as.matrix() %>%
corrplot::corrplot(is.corr = FALSE)