Search code examples
rggplot2r-corrplotggcorrplot

How to replicate correlation plot with greyscale coefficients in the lower half and circles in upper half?


I'm looking to replicate this correlation plot, or at least get as close as possible to it.

Specifically, I want:

  • the correlation values in the lower half, with values varying on a greyscale based on absolute value
  • the circles in the top half, with varying diameter and on the colour scale.
  • I want to be able to edit the axis scale labels so that full descriptions are on the y-axis, and numeric references on the x-axis

I have gotten relatively close, but have not managed precise enough replication. I describe my closest attempts below with reproducible code. The corrplot package has gotten me closest.

    # general preparation
    library(car)
    correlations = cor(mtcars)
  1. corrplot package
    library(corrplot)
    
    corrplot.mixed(correlations, 
                   upper = "number", #upper.col = ???
                   lower = "circle", #lower.col = ???
                   tl.pos = "lt", tl.col = "black", tl.cex = 0.5)

Notes:

  1. cormat package
    source("http://www.sthda.com/upload/rquery_cormat.r")

    rquery.cormat(mtcar)
  1. ggcorrplot
    library("ggcorrplot")
    
    # circles separate
    ggcorrplot(correlations, # correlation matrix
               method = "circle", # circles instead of squares
               type = "upper", # show only upped triangle
               show.diag = F, # don't show diagonal values (1)
               lab = F, # don't show cor coeffs
               outline.col = "white", # no outline of circles
               ggtheme = theme_bw, # theme
               colors = c("#440154FF","#238A8DFF","#FDE725FF"))
    
    # coefs separate
    ggcorrplot(correlations, # correlation matrix
               method = "circle", # circles instead of squares
               type = "upper", # show only upped triangle
               show.diag = F, # don't show diagonal values (1)
               lab = T, # don't show cor coeffs
               outline.col = NA, # don't show circles
               ggtheme = theme_bw, # theme
               colors = c("#440154FF","#238A8DFF","#FDE725FF"))
    
    # can't combine both plots?
  1. corrgram package
    library(corrgram)
    
    corrgram(correlations,
             labels = indices_all,
             lower.panel = "panel.fill",
             upper.panel = "panel.cor")

Some other notes:

  • It seems the halves of the plots tend to run via the opposite diagonal than in the example plot, but I guess that's not a big concern.

Solution

  • Out-of-the-box options are quick and nice. However, when it comes to customizing then IMHO it may be worthwhile to build up the plot from scratch using ggplot2. As a first step this involves some data wrangling to get you correlation matrix into the right shape. Also in this step I convert the categories to factors and a numeric id. Based on the ids I split the data in the upper and lower diagonal values which could then be plotted separately using a geom_point and a geom_text. Besides that it's important to add the drop=FALSE to the x and y scale to keep all factor levels and the right order. Also I use some functions to get the desired axis labels:

    EDIT: Following the suggestion by @AllanCameron I added a coord_equal as the "final" touch to get a nice square matrix like look. And Thanks to @RichtieSacramento the code now maps the absolute value on the size aes.

    library(dplyr)
    library(tidyr)
    library(ggplot2)
    
    correlations = cor(mtcars)
    
    levels <- colnames(mtcars)
    
    corr_long <- correlations %>%
      data.frame() %>% 
      mutate(row = factor(rownames(.), levels = levels), 
             rowid = as.numeric(row)) %>%
      pivot_longer(-c(row, rowid), names_to = "col") %>%
      mutate(col = factor(col, levels = levels),
             colid = as.numeric(col))
    
    ggplot(corr_long, aes(col, row)) +
      geom_point(aes(size = abs(value), fill = value), 
                 data = ~filter(.x, rowid > colid), shape = 21) +
      geom_text(aes(label = scales::number(value, accuracy = .01), color = abs(value)), 
                data = ~filter(.x, rowid < colid), size = 8 / .pt) +
      scale_x_discrete(labels = ~ attr(.x, "pos"), drop = FALSE) +
      scale_y_discrete(labels = ~ paste0(.x, " (", attr(.x, "pos"), ")"), drop = FALSE) +
      scale_fill_viridis_c(limits = c(-1, 1)) +
      scale_color_gradient(low = grey(.8), high = grey(.2)) +
      coord_equal() +
      guides(size = "none", color = "none") +
      theme(legend.position = "bottom", 
            panel.grid = element_blank(), 
            axis.ticks = element_blank()) +
      labs(x = NULL, y = NULL, fill = NULL)
    

    enter image description here