Search code examples
rggplot2colorsscale

Color coding and legend labels in ggplot


I have data which I'd like to plot using ggplot's geom_point:

  set.seed(1)
  df <- data.frame(x=rnorm(100),y=rnorm(100),val=c(rnorm(90),rep(NA,10)))

I add colors according to intervals of df$val:

  intervals.df <- data.frame(interval=c("(-3,-2]","(-2,-0.999]","(-0.999,0]","(0,1.96]","(1.96,3.91]","(3.91,5.87]","not expressed"),
                             start=c(-3,-2,-0.999,0,1.96,3.91,NA),end=c(-2,-0.999,0,1.96,3.91,5.87,NA),
                             col=c("#2f3b61","#436CE8","#E0E0FF","#7d4343","#C74747","#EBCCD6","#D3D3D3"),stringsAsFactors=F)
  df <- cbind(df,do.call(rbind,lapply(df$val,function(x){
    if(is.na(x)){
      return(data.frame(col=intervals.df$col[nrow(intervals.df)],interval=intervals.df$interval[nrow(intervals.df)]))
    } else{
      idx <- which(intervals.df$start <= x & intervals.df$end >= x)
      return(data.frame(col=intervals.df$col[idx],interval=intervals.df$interval[idx]))
    }
  })))

Here I set df$col as factor and set the labels to be the intervals so I can plot them in the legend:

  df$col <- factor(df$col,levels=intervals.df$col,labels=intervals.df$interval)

This will also display all the intervals including those that the df$val might not cover, but I want that.

And here's how I try to plot it:

library(ggplot2)
ggplot(df,aes(x=x,y=y,colour=col))+geom_point(cex=2,shape=1,stroke=1)+labs(x="X",y="Y")+theme_bw()+theme(legend.key=element_blank(),panel.border=element_blank(),strip.background=element_blank())+scale_shape(solid=T)+scale_fill_manual(drop=FALSE,values=levels(df$col),name="DE")

Which gets me close but the colors are not right: enter image description here

So I thought this plot command will correct that (adding scale_color_manual):

ggplot(df,aes(x=x,y=y,colour=col))+geom_point(cex=2,shape=1,stroke=1)+labs(x="X",y="Y")+theme_bw()+theme(legend.key=element_blank(),panel.border=element_blank(),strip.background=element_blank())+scale_shape(solid=T)+scale_fill_manual(drop=FALSE,values=levels(df$col),name="DE")+scale_color_manual(drop=FALSE,values=levels(df$col),name="DE")

But that throws the error:

Error in grDevices::col2rgb(colour, TRUE) : invalid color name '(0,1.96]'

So, how to I get the colors right (and the legend name right too)?


Solution

  • One option is map the colors to interval after setting the levels via intervals.df so the order of the levels and the number of levels is correct. Use the colors from intervals.df, making a named vector of the colors to pass to scale_color_manual.

    # Set levels of interval via intervals.df
    df$interval = factor(df$interval, levels=intervals.df$interval)
    
    # Named vector of the colors based on intervals.df
    colors = intervals.df$col
    names(colors) = intervals.df$interval
    
    ggplot(df, aes(x=x, y=y, colour=interval))+
        geom_point(cex=2, shape=1, stroke=1) +
        labs(x="X", y="Y")+
        theme_bw()+
        theme(legend.key=element_blank(),
             panel.border=element_blank(), strip.background=element_blank())+
        scale_color_manual(values = colors, name = "DE", drop = FALSE)