I am trying to plot my data with manual color scale based on values. However the colors that are displayed nowhere near correspond to the values that I provide. My data looks like this:
# A tibble: 100 x 2
chunk avg
<dbl> <dbl>
1 0 0.0202
2 1 0.0405
3 2 0.0648
4 3 0.0405
5 4 0.0283
6 5 -0.00806
7 6 -0.0526
8 7 -0.0364
9 8 -0.00810
10 9 0.0243
# ... with 90 more rows
Then I pipe it to ggplot2:
data %>%
ggplot(
aes(
chunk,
avg,
fill = cut(
avg,
c(-Inf, -0.01, 0.01, Inf)
)
)
) +
geom_bar(stat = "identity", show.legend = FALSE) +
scale_color_manual(
values = c(
"(-Inf, -0.01)" = "red",
"[-0.01, 0.01]" = "yellow",
"(0.01, Inf)" = "green"
)
)
As you can see, I want to color my bars based on values, below -0.01 red, above 0.01 green and bettween - yellow.
This is the result I receive:
What am I missing?
The reason you are getting different colours I think is because ggplot
isn't automatically making a connection between the colours you have supplied and the groups you have supplied. I'm not 100% sure why this is the case, but I can offer a solution.
You can create a new column in the data before you send it to ggplot
for plotting. We will call it colour_group
but you can call it anything. We populate this new column based on the values of avg
(I have made sample data as you haven't supplied all of yours). We use ifelse()
which tests a condition against the data, and returns a value based on if the test
is TRUE
or FALSE
.
In the below code, colour_group = ifelse(avg < -0.01, 'red', NA)
may be read aloud as: "If my value of avg is less than -0.01, make the value for the colour_group
column 'red', otherwise make it NA
". For subsequent lines, we want the FALSE
result to keep the results already in the colour_group
column - the ones made on the previous lines.
# make sample data
tibble(
chunk = 1:100,
avg = rnorm(100, 1, 1)
) %>%
{. ->> my_data}
# make the new 'colour_group' column
my_data %>%
mutate(
colour_group = ifelse(avg < -0.01, 'red', NA),
colour_group = ifelse(avg > 0.01, 'green', colour_group),
colour_group = ifelse(avg > -0.01 & avg < 0.01 , 'yellow', colour_group),
) %>%
{. ->> my_data_modified}
Now we can plot the data, and specify that we want to use the colour_group
column as the fill
aesthetic. When specifying scale_fill_manual
, we then tell ggplot
that if we have the value of green
in the colour_group
column, we want the bar to be a green colour, and so on for the other colours.
my_data_modified %>%
ggplot(aes(chunk, avg, fill = colour_group))+
geom_bar(stat = 'identity', show.legend = FALSE)+
scale_fill_manual(
values = c('green' = 'green', 'red' = 'red', 'yellow' = 'yellow')
)
It is slightly confusing, in a way having to specify the colour twice. However, we could specify the values of colour_group
as anything, such as 1, 2, 3 or low, med, high. In this instance, you would do the same code but modify the ifelse
statements, and change scale_fill_manual
to match these values. For example:
my_data %>%
mutate(
colour_group = ifelse(avg < -0.01, 'low', NA),
colour_group = ifelse(avg > 0.01, 'high', colour_group),
colour_group = ifelse(avg > -0.01 & avg < 0.01 , 'med', colour_group),
) %>%
{. ->> my_data_modified}
my_data_modified %>%
ggplot(aes(chunk, avg, fill = colour_group))+
geom_bar(stat = 'identity', show.legend = FALSE)+
scale_fill_manual(
values = c('high' = 'green', 'low' = 'red', 'med' = 'yellow')
)