Search code examples
rggplot2colorsbar-chartscale-color-manual

ggplot2 displays wrong colors with manual scale


I am trying to plot my data with manual color scale based on values. However the colors that are displayed nowhere near correspond to the values that I provide. My data looks like this:

# A tibble: 100 x 2
   chunk      avg  
   <dbl>    <dbl>  
 1     0  0.0202
 2     1  0.0405
 3     2  0.0648
 4     3  0.0405
 5     4  0.0283
 6     5 -0.00806
 7     6 -0.0526
 8     7 -0.0364
 9     8 -0.00810
10     9  0.0243
# ... with 90 more rows

Then I pipe it to ggplot2:

data %>%
    ggplot(
        aes(
            chunk,
            avg,
            fill = cut(
                avg,
                c(-Inf, -0.01, 0.01, Inf)
            )
        )
    ) +
    geom_bar(stat = "identity", show.legend = FALSE) +
    scale_color_manual(
        values = c(
            "(-Inf, -0.01)" = "red",
            "[-0.01, 0.01]" = "yellow",
            "(0.01, Inf)" = "green"
        )
    )

As you can see, I want to color my bars based on values, below -0.01 red, above 0.01 green and bettween - yellow.

This is the result I receive:

failed plot

What am I missing?


Solution

  • The reason you are getting different colours I think is because ggplot isn't automatically making a connection between the colours you have supplied and the groups you have supplied. I'm not 100% sure why this is the case, but I can offer a solution.

    You can create a new column in the data before you send it to ggplot for plotting. We will call it colour_group but you can call it anything. We populate this new column based on the values of avg (I have made sample data as you haven't supplied all of yours). We use ifelse() which tests a condition against the data, and returns a value based on if the test is TRUE or FALSE.

    In the below code, colour_group = ifelse(avg < -0.01, 'red', NA) may be read aloud as: "If my value of avg is less than -0.01, make the value for the colour_group column 'red', otherwise make it NA". For subsequent lines, we want the FALSE result to keep the results already in the colour_group column - the ones made on the previous lines.

    # make sample data
    tibble(
      chunk = 1:100,
      avg = rnorm(100, 1, 1)
    ) %>%
      {. ->> my_data}
    
    
    # make the new 'colour_group' column
    my_data %>%
      mutate(
        colour_group = ifelse(avg < -0.01, 'red', NA),
        colour_group = ifelse(avg > 0.01, 'green', colour_group),
        colour_group = ifelse(avg > -0.01 & avg < 0.01 , 'yellow', colour_group),
      ) %>%
      {. ->> my_data_modified}
    

    Now we can plot the data, and specify that we want to use the colour_group column as the fill aesthetic. When specifying scale_fill_manual, we then tell ggplot that if we have the value of green in the colour_group column, we want the bar to be a green colour, and so on for the other colours.

    my_data_modified %>%
      ggplot(aes(chunk, avg, fill = colour_group))+
      geom_bar(stat = 'identity', show.legend = FALSE)+
      scale_fill_manual(
        values = c('green' = 'green', 'red' = 'red', 'yellow' = 'yellow')
      )
    

    enter image description here

    It is slightly confusing, in a way having to specify the colour twice. However, we could specify the values of colour_group as anything, such as 1, 2, 3 or low, med, high. In this instance, you would do the same code but modify the ifelse statements, and change scale_fill_manual to match these values. For example:

    my_data %>%
      mutate(
        colour_group = ifelse(avg < -0.01, 'low', NA),
        colour_group = ifelse(avg > 0.01, 'high', colour_group),
        colour_group = ifelse(avg > -0.01 & avg < 0.01 , 'med', colour_group),
      ) %>%
      {. ->> my_data_modified}
    
    my_data_modified %>%
      ggplot(aes(chunk, avg, fill = colour_group))+
      geom_bar(stat = 'identity', show.legend = FALSE)+
      scale_fill_manual(
        values = c('high' = 'green', 'low' = 'red', 'med' = 'yellow')
      )
    

    enter image description here