I have a correlation matrix (200x200) in the form of:
>cormat
n1 n2 n3
n1 1.000000000 0.132555050 0.009169320
n2 -0.121419322 1.000000000 -0.174995204
n3 -0.259331076 -0.171652163 1.000000000
Etc.
I want to visualize the distribution of the correlation between the columns in the data frame, for which this matrix has been created, using single violin plot. After typing this code:
ggplot()+geom_violin(aes(c(cormat[1:200,]), c(cormat[,1:200])))
I got:
Is it a possible result? Is there a better way to plot matrix using geom_violin()?
It will help to make something a bit more representative:
library(ggplot2)
set.seed(69)
df <- data.frame(a = 1:10, b = 1/33 * 1:10 + rnorm(10), c = -(1:10) * 0.1 + rnorm(10),
d = 1/5 * 1:10 + rnorm(10), e = rnorm(10))
cormat <- cor(df)
Now in your example, since cormat
is 500 square, c(cormat[1:500,])
is the same as c(cormat[,1:500])
, which are both the same as c(cormat)
, that is, just cormat
unrolled into a 250,000 length vector. Your plot is really just a density plot of all the correlation values. I'm not sure how useful this is:
ggplot() + geom_violin(aes(c(cormat), c(cormat)))
You could instead do a plot of all the correlations separately as violin plots:
plot_df <- reshape2::melt(cormat)
ggplot(data = plot_df) + geom_violin(aes(Var1, value, fill = Var1))
but this won't work well for 500 variables.
A more standard way to represent a correlation matrix this big would be as a correlation plot, like:
ggplot(plot_df) + geom_tile(aes(Var1, Var2, fill = value))
Created on 2020-07-12 by the reprex package (v0.3.0)