Search code examples
rggplot2colorsr-corrplot

Changing correlation matrix color scheme to start at specified colorlabel


I have a dataset where the lowest correlation coefficient is 0.83, and the highest is around 0.99. I am using the package "corrplot" in R, and trying to get a color spectrum using the "colorRamps" package. I also want the very end of the spectrum to start at my specified upper and lower limits (0.8, 1). I've looked almost everywhere but can't seem to find the solution. I also can't load the color scheme that I want.

I have used colorRampPalette successfully, but I still can't get the beginning and the end of the color spectrum to start and end at my specified limits.

Here is what I have tried:

library(corrplot)
library(colorRampPalette)
library(colorRamps)

pal <- colorRamps::matlab.like2


###########Notice my cl.lim is set to 0.8-1################

corrplot(data, method = "number", type="lower", is.corr=FALSE, margin=c(0,0,0,0),col=pal, tl.col='black', cl.lim = c(0.8, 1),tl.cex=0.7, outline=TRUE, title = "9 DAT")

After running the "corrplot" code line, I get the following:

"Warning messages:
1: In text.default(pos.xlabel[, 1], pos.xlabel[, 2], newcolnames, srt = tl.srt,  :
  "margin" is not a graphical parameter
2: In text.default(pos.ylabel[, 1], pos.ylabel[, 2], newrownames, col = tl.col,  :
  "margin" is not a graphical parameter
3: In title(title, ...) : "margin" is not a graphical parameter"

My graph also does not generate.

I would greatly appreciate the help. Thanks everyone!


Solution

  • I use ggplot2 for plotting. So, let me show you how to achieve what you need in ggplot2. Let's generate a valid correlation matrix first:

    library(ggplot2)
    library(tidyr)
    library(ggplot2)
    
    set.seed(123)
    
    df <- data.frame(X1 = 1:100,
                     X2 = 0.75*(1:100) + rnorm(100),
                     X3 = 0.25*(1:100) + rnorm(100,sd = 20),
                     X4 = 0.5*(1:100) + rnorm(100,sd = 10))
    
    cm <- round(cor(df), 2)
    cm[lower.tri(cm)] <- NA
    
    cm <- as.data.frame(cm) %>% 
      mutate(Var1 = factor(row.names(.), levels=row.names(.))) %>% 
      gather(key = Var2, value = value, -Var1, na.rm = TRUE, factor_key = TRUE)
    

    Output

    # Var1 Var2 value
    # 1    X1   X1  1.00
    # 5    X1   X2  1.00
    # 6    X2   X2  1.00
    # 9    X1   X3  0.43
    # 10   X2   X3  0.43
    # 11   X3   X3  1.00
    # 13   X1   X4  0.86
    # 14   X2   X4  0.85
    # 15   X3   X4  0.38
    # 16   X4   X4  1.00
    

    Now, suppose one wants to plot this correlation matrix using ggplot2:

    ggplot(data = cm) + 
      geom_tile(aes(Var2, Var1, fill = value)) +
      scale_fill_gradientn(colours = rainbow(5))
    

    enter image description here

    The default range for colors is range(cm$value), i.e., it spans the entire range of the target variable. Suppose one wants to use a range of [0.5, 0.9]. This cannot be achieved by simply changing the limits variable - simply using limits will result in grey areas on your plot. One can use the oob = scales::squish parameter for this (read up on what it does):

    ggplot(data = cm) + 
      geom_tile(aes(Var2, Var1, fill = value)) +
      scale_fill_gradientn(colours = rainbow(5), limits = c(0.5, 0.9),
                           oob = scales::squish)
    

    enter image description here

    This will ensure that colors are properly adjusted for the new range.