Search code examples
rggplot2r-faq

Understanding color scales in ggplot2


There are so many ways to define colour scales within ggplot2. After just loading ggplot2 I count 22 functions beginging with scale_color_* (or scale_colour_*) and same number beginging with scale_fill_*. Is it possible to briefly name the purpose of the functions below? Particularly I struggle with the differences of some of the functions and when to use them.

  • scale_*_binned()
  • scale_*_brewer()
  • scale_*_continuous()
  • scale_*_date()
  • scale_*_datetime()
  • scale_*_discrete()
  • scale_*_distiller()
  • scale_*_fermenter()
  • scale_*_gradient()
  • scale_*_gradient2()
  • scale_*_gradientn()
  • scale_*_grey()
  • scale_*_hue()
  • scale_*_identity()
  • scale_*_manual()
  • scale_*_ordinal()
  • scale_*_steps()
  • scale_*_steps2()
  • scale_*_stepsn()
  • scale_*_viridis_b()
  • scale_*_viridis_c()
  • scale_*_viridis_d()

What I tried

I've tried to make some research on the web but the more I read the more I get onfused. To drop some random example: "The default scale for continuous fill scales is scale_fill_continuous() which in turn defaults to scale_fill_gradient()". I do not get what the difference of both functions is. Again, this is just an example. Same is true for scale_color_binned() and scale_color_discrete() where I can not name the difference. And in case of scale_color_date() and scale_color_datetime() the destription says "scale_*_gradient creates a two colour gradient (low-high), scale_*_gradient2 creates a diverging colour gradient (low-mid-high), scale_*_gradientn creates a n-colour gradient." which is nice to know but how is this related to scale_color_date() and scale_color_datetime()? Looking for those functions on the web does not give me very informative sources either. Reading on this topic gets also chaotic because there are tons of color palettes in different packages which are sequential/ diverging/ qualitative plus one can set same color in different ways, i.e. by color name, rgb, number, hex code or palette name. In part this is not directly related to the question about the 2*22 functions but in some cases it is because providing a "wrong" palette results in an error (e.g. the error"Continuous value supplied to discrete scale).

Why I ask this

I need to do many plots for my work and I am supposed to provide some function that returns all kind of plots. The plots are supposed to have similiar layout so that they fit well together. One aspect I need to consider here is that the colour scales of the plots go well together. See here for example, where so many different kind of plots have same colour scale. I was hoping I could use some general function which provides a colour palette to any data, regardless of whether the data is continuous or categorical, whether it is a fill or col easthetic. But since this is not how colour scales are defined in ggplot2 I need to understand what all those functions are good for.


Solution

  • This is a good question... and I would have hoped there would be a practical guide somewhere. One could question if SO would be a good place to ask this question, but regardless, here's my attempt to summarize the various scale_color_*() and scale_fill_*() functions built into ggplot2. Here, we'll describe the range of functions using scale_color_*(); however, the same general rules will apply for scale_fill_*() functions.

    Overall Categorization

    There are 22 functions in all, but happily we can group them intelligently based on practical usage scenarios. There are three key criteria that can be used to define practically how to use each of the scale_color_*() functions:

    1. Nature of the mapping data. Is the data mapped to the color aesthetic discrete or continuous? CONTINUOUS data is something that can be explained via real numbers: time, temperature, lengths - these are all continuous because even if your observations are 1 and 2, there can exist something that would have a theoretical value of 1.5. DISCRETE data is just the opposite: you cannot express this data via real numbers. Take, for example, if your observations were: "Model A" and "Model B". There is no obvious way to express something in-between those two. As such, you can only represent these as single colors or numbers.

    2. The Colorspace. The color palette used to draw onto the plot. By default, ggplot2 uses (I believe) a color palette based on evenly-spaced hue values. There are other functions built into the library that use either Brewer palettes or Viridis colorspaces.

    3. The level of Specification. Generally, once you have defined if the scale function is continuous and in what colorspace, you have variation on the level of control or specification the user will need or can specify. A good example of this is the functions: *_continuous(), *_gradient(), *_gradient2(), and *_gradientn().

    Continuous Scales

    We can start off with continuous scales. These functions are all used when applied to observations that are continuous variables (see above). The functions here can further be defined if they are either binned or not binned. "Binning" is just a way of grouping ranges of a continuous variable to all be assigned to a particular color. You'll notice the effect of "binning" is to change the legend keys from a "colorbar" to a "steps" legend.

    The continuous example (colorbar legend):

    library(ggplot2)
    cont <- ggplot(mtcars, aes(mpg, disp, color=cyl)) + geom_point(size=4)
    
    cont + scale_color_continuous()
    

    enter image description here

    The binned example (color steps legend):

    cont + scale_color_binned()
    

    enter image description here

    The following are continuous functions.

    Name of Function Colorspace Legend What it does
    scale_color_continuous() default Colorbar basic scale (as if you did nothing)
    scale_color_gradient() user-defined Colorbar define low and high values
    scale_color_gradient2() user-defined Colorbar define low mid and high values
    scale_color_gradientn() user_defined Colorbar define any number of incremental val
    scale_color_binned() default Colorsteps basic scale, but binned
    scale_color_steps() user-defined Colorsteps define low and high values
    scale_color_steps2() user-defined Colorsteps define low, mid, and high vals
    scale_color_stepsn() user-defined Colorsteps define any number of incremental vals
    scale_color_viridis_c() Viridis Colorbar viridis color scale. Change palette via option=.
    scale_color_viridis_b() Viridis Colorsteps Viridis color scale, binned. Change palette via option=.
    scale_color_distiller() Brewer Colorbar Brewer color scales. Change palette via palette=.
    scale_color_fermenter() Brewer Colorsteps Brewer color scale, binned. Change palette via palette=.

    Discrete Scales

    These discrete scales apply only when the data mapped is discrete (see above). Since the nature and colors of discrete scales are more disjointed by definition, these tend to be more manually-defined. We can use the same mtcars example and "force" a discrete scale applied to the color by mapping to cyl defined as.factor():

    discrete <- ggplot(mtcars, aes(mpg, disp, color=as.factor(cyl))) + geom_point(size=4)
    discrete
    

    enter image description here

    The following are discrete scale functions:

    Name of Function What it does
    scale_color_discrete() The basic default. Evenly-spaced hues
    scale_color_hue() Same as scale_color_discrete(), but you can define the range of hues and colors used
    scale_color_grey() Uses a greyscale. Can define the range.
    scale_color_manual() Must define specifically every color used. You can apply to your mapping by supplying a named vector for values=.
    scale_color_identity() A special-case function where your data is made up of names of colors - not names of factor levels
    scale_color_brewer() The discrete version of the Brewer colorspaces. Change palette via palette=.
    scale_color_viridis_d() The discrete version of the viridis colorspaces. Can change palette via option=.

    Viridis and Brewer Scales

    A final note, you'll see above defined the functions for Brewer and Viridis palette options. Each one of these contain a few color palettes chosen to better represent ordered and non-ordered data based on some color theory. It's useful to do a little research in color theory applied to data visualization on your own. There are discrete, continuous, and binned versions of each of the two function classes, and each one has a slightly different method to change the specific palette. You'll have to Google around a bit for some representations of each scale to get a feel for them, but useful usage notes include:

    Colorspace Discrete version Continuous version Binned version
    Brewer scale_color_brewer() scale_color_distiller() scale_color_fermenter()
    Viridis scale_color_viridis_d() scale_color_viridis_c() scale_color_viridis_b()

    One final note here: scale_color_ordinal() is really the same as scale_color_viridis_d()... I honestly don't really see the difference, so perhaps one is just a wrapper for the other?

    Date Scales

    The final two more esoteric functions are the ones related to date and datetime. These functions are scale_color_date() and scale_color_datetime(), respectively. They are basically the same as the scale_color_continuous() function, but with some convenience wrappers for labeling dates. This is the same relationship that scale_x_date() has with scale_x_continuous().

    ggplot(economics, aes(x=date, y=unemploy, fill=date)) + geom_col() +scale_fill_date()
    

    enter image description here

    You see the graphic result is the same as scale_color_continuous(), but note the formatting benefit for representing dates correction using scale_color_date():

    ggplot(economics, aes(x=date, y=unemploy, fill=date)) + geom_col() +scale_fill_continuous()
    

    enter image description here

    It all makes sense...

    Given all of this above... now the following error messages you have probably seen before become quite apparent:

    > discrete + scale_color_continuous()
    Error: Discrete value supplied to continuous scale
    
    > cont + scale_color_discrete()
    Error: Continuous value supplied to discrete scale