Search code examples
rggplot2rescale

Avoid rescaling while binning using scale_*_steps


I have continuous data that I'd like to display using a binned color scale. Since the data is unevenly distributed, I want to have more breaks at the lower end of the spectrum, to emphasize the difference in the low values. However, in the process of binning it seems that scale_fill_stepsn automatically rescales the data (see note about rescaler here) so that the color palette reflects the relative position of the breaks. This makes the differences in the low values hard to distinguish. I'd like to have the color palette evenly distributed across my defined breaks. Is there any way to do this while still using scale_fill_stepsn()?

I know I could manually bin my data and then pass it into ggplot as discrete, but I'd like to avoid this. I also want to avoid doing any transformation of the data (such as taking the log).

library(ggplot2)

#generate sample data with outliers
df <- expand.grid(x = 0:5, y = 0:5)
df$z <- abs(rnorm(36))
df$z[[4]] <- 12
df$z[[8]] <- 17
df$z[[30]] <- 7
breaks=c(0, 0.25, 0.5, 1, 2, 5, 10, 20)

ggplot(df) +
  geom_tile(aes(x=x, y=y, fill=z)) + 
  scale_fill_stepsn(colors=terrain.colors(7),
                    breaks=breaks)

ggplot figure showing tiles; most of the tiles are very similar shades of green


Solution

  • You can use the values= argument to specify how the data should be rescaled, i.e. according to the docs:

    if colours should not be evenly positioned along the gradient this vector gives the position (between 0 and 1) for each colour in the colours vector.

    library(ggplot2)
    set.seed(123)
    
    breaks <- c(0, 0.25, 0.5, 1, 2, 5, 10, 20)
    
    ggplot(df) +
      geom_tile(aes(x = x, y = y, fill = z)) +
      scale_fill_stepsn(
        colors = terrain.colors(7),
        breaks = breaks,
        values = scales::rescale(breaks),
        limits = range(breaks)
      )
    

    enter image description here