Search code examples
rviolin-plot

Visualize a matrix using geom_violin() in r


I have a correlation matrix (200x200) in the form of:

 >cormat

          n1              n2              n3  
 n1    1.000000000   0.132555050     0.009169320    
 n2   -0.121419322   1.000000000    -0.174995204    
 n3   -0.259331076  -0.171652163     1.000000000

Etc.

I want to visualize the distribution of the correlation between the columns in the data frame, for which this matrix has been created, using single violin plot. After typing this code:

 ggplot()+geom_violin(aes(c(cormat[1:200,]), c(cormat[,1:200])))

I got:

Violin plot of correlation matrix

Is it a possible result? Is there a better way to plot matrix using geom_violin()?


Solution

  • It will help to make something a bit more representative:

    library(ggplot2)
    
    set.seed(69)
    df <- data.frame(a = 1:10, b = 1/33 * 1:10 + rnorm(10), c = -(1:10) * 0.1 + rnorm(10),
                     d = 1/5 * 1:10 + rnorm(10), e = rnorm(10))
    cormat <- cor(df)
    

    Now in your example, since cormat is 500 square, c(cormat[1:500,]) is the same as c(cormat[,1:500]), which are both the same as c(cormat), that is, just cormat unrolled into a 250,000 length vector. Your plot is really just a density plot of all the correlation values. I'm not sure how useful this is:

    ggplot() + geom_violin(aes(c(cormat), c(cormat)))
    

    You could instead do a plot of all the correlations separately as violin plots:

    
    plot_df <- reshape2::melt(cormat)
    ggplot(data = plot_df) + geom_violin(aes(Var1, value, fill = Var1))
    

    but this won't work well for 500 variables.

    A more standard way to represent a correlation matrix this big would be as a correlation plot, like:

    ggplot(plot_df) + geom_tile(aes(Var1, Var2, fill = value))
    

    enter image description here Created on 2020-07-12 by the reprex package (v0.3.0)