Search code examples
rggplot2ggplotly

GGPlot won't display more than 3 colors


GGPlot is only displaying the first three colors (green, yellow, and orange) with the following code:

p = ggplot(MobileOutput, aes(x=`Timestamp(UTC)`,y=`PM2.5(ug/m3)`))+
  geom_point(aes(colour = cut(`PM2.5(ug/m3)`, c(0, 12.0, 35.4, 55.4, 150.4, 250.4, 500, Inf))),
             size = 0.1) +
  ylim(0,500)  +
  theme_bw() +
  scale_color_manual(name = "PM2.5",
                     values = c("(0,12]" = "green2",
                                "(12,35.4]" = "yellow2",
                                "(35.4,55.4]" = "orange",
                                "(55.4,150.4]" = "red1",
                                "(150.4, 250.4]" = "red2",
                                "(250.4, 500]" = "red3",
                                "(500, Inf]" = "red4"))
gPlotly <- ggplotly()

All of the reds are still being plotted in a clear/white color that isn't visible. I am able to hover over the invisible data and see information on it confirming that it is being plotted (see image below). Additionally, a green, yellow, and orange dot are appearing next to their ranges in the legend, while none of the expected red dots are in the legend.

enter image description here

If I adjust the code above to include 3 ranges as follows, all of the colors appear as expected:

p = ggplot(MobileOutput, aes(x=`Timestamp(UTC)`,y=`PM2.5(ug/m3)`))+
  geom_point(aes(colour = cut(`PM2.5(ug/m3)`, c(0, 12.0, 35.4, Inf))),
             size = 0.1) +
  ylim(0,500)  +
  theme_bw() +
  scale_color_manual(name = "PM2.5",
                     values = c("(0,12]" = "green2",
                                "(12,35.4]" = "yellow2",
                                "(35.4,Inf]" = "red4"))
gPlotly <- ggplotly()

enter image description here

If I add a single additional range, anything beyond the 3rd item listed goes completely invisible again, as seen in the first image.

Is there something I can adjust to get GGPlot to support displaying more than 3 colors/ranges in the legend and plot?


Solution

  • It looks like cut() rounds values in its labels: in this example the breakpoints are all at (x+0.4), but the endpoints of the larger bins are given in the labels as integers (150, 250 instead of 150.4, 250.4). This will lead to a mismatch between the unique values/levels in your cut PM2.5 vector and the ones you specified in the scale.

     table(cut(rnorm(1000,150,200), 
        breaks=c(0, 12.0, 35.4, 55.4, 150.4, 250.4, 500, Inf)))
    
     (0,12]   (12,35.4] (35.4,55.4]  (55.4,150]   (150,250]   (250,500] 
         17          31          25         194         199         282 
     (500,Inf] 
         43