Search code examples
rggplot2plotfillgeom-area

ggplot: How to apply geom_area() or similar to fill area under three distinct geom_line() without overlapping each indepedent fill?


I have a plot based on my data nd with three geom_line() that demonstrates the probability of death after 1-yr: nd$y_et, 3-yrs: nd$y_tre and 5-yrs: nd$y_fem, respectively, as function of number of resected lymph nodes nd$n_fjernet.

Question: how can I fill each area below the three individual geom_line() of nd$y_et, y_tre, y_fem, without the fill overlapping the subsequent geom_line + fill?

I tried geom_area and geom_polygon but did not come even close to a proper solution.

Current plot

enter image description here

With

ggplot(nd, aes(x=n_fjernet))  +
         geom_line(aes(y=y_et)) + 
         geom_line(aes(y=y_tre)) + 
         geom_line(aes(y=y_fem)) +    scale_x_continuous(breaks = seq(0,25,5), limits=c(0,25))

Should give the expected output:

enter image description here

UPDATE

I applied the solution provided below, yielding

ndd %>% 
  rename(X3=y_et, X2=y_tre, X1=y_fem) %>% 
  pivot_longer(values_to="N", names_to="Variable", cols=c(X1:X3)) %>%
  ggplot(aes(x=n_fjernet, y=N, fill=Variable, colour=Variable)) +
  geom_area(position=position_identity(), alpha=.15) +
  geom_line(size=3, color="white") +
  geom_line(size=.75) +
  scale_fill_manual(values=c("#2C77BF", "#E38072", "#6DBCC3")) +
  scale_colour_manual(values=c("#2C77BF", "#E38072", "#6DBCC3")) +
  scale_x_continuous(breaks = seq(0,10,5), limits=c(0,10))

With

enter image description here

As we are getting close to the intended plot, there unfortunately still overlapping fills. The blue-fill can be seen behind the red-fill; and, both the blue-fill and red-fill is behind the green-fill.

Question: how to include the fills without overlapping?

My data nd

    nd <- structure(list(y_et = c(0.473, 0.473, 0.472, 0.471, 0.471, 0.47, 
0.47, 0.469, 0.468, 0.468, 0.467, 0.467, 0.466, 0.465, 0.465, 
0.464, 0.464, 0.463, 0.462, 0.462, 0.461, 0.461, 0.46, 0.459, 
0.459, 0.458, 0.458, 0.457, 0.456, 0.456, 0.455, 0.455, 0.454, 
0.453, 0.453, 0.452, 0.452, 0.451, 0.45, 0.45, 0.449, 0.449, 
0.448, 0.447, 0.447, 0.446, 0.446, 0.445, 0.445, 0.444, 0.443, 
0.443, 0.442, 0.442, 0.441, 0.44, 0.44, 0.439, 0.439, 0.438, 
0.438, 0.437, 0.436, 0.436, 0.435, 0.435, 0.434, 0.433, 0.433, 
0.432, 0.432, 0.431, 0.431, 0.43, 0.429, 0.429, 0.428, 0.428, 
0.427, 0.427, 0.426, 0.425, 0.425, 0.424, 0.424, 0.423, 0.423, 
0.422, 0.421, 0.421, 0.42, 0.42, 0.419, 0.419, 0.418, 0.417, 
0.417, 0.416, 0.416, 0.415), y_tre = c(0.895, 0.894, 0.894, 0.893, 
0.893, 0.893, 0.892, 0.892, 0.891, 0.891, 0.89, 0.89, 0.889, 
0.889, 0.889, 0.888, 0.888, 0.887, 0.887, 0.886, 0.886, 0.886, 
0.885, 0.885, 0.884, 0.884, 0.883, 0.883, 0.882, 0.882, 0.881, 
0.881, 0.881, 0.88, 0.88, 0.879, 0.879, 0.878, 0.878, 0.877, 
0.877, 0.876, 0.876, 0.875, 0.875, 0.875, 0.874, 0.874, 0.873, 
0.873, 0.872, 0.872, 0.871, 0.871, 0.87, 0.87, 0.869, 0.869, 
0.868, 0.868, 0.867, 0.867, 0.866, 0.866, 0.865, 0.865, 0.865, 
0.864, 0.864, 0.863, 0.863, 0.862, 0.862, 0.861, 0.861, 0.86, 
0.86, 0.859, 0.859, 0.858, 0.858, 0.857, 0.857, 0.856, 0.856, 
0.855, 0.855, 0.854, 0.854, 0.853, 0.853, 0.852, 0.852, 0.851, 
0.851, 0.85, 0.85, 0.849, 0.848, 0.848), y_fem = c(0.974, 0.974, 
0.973, 0.973, 0.973, 0.973, 0.973, 0.973, 0.972, 0.972, 0.972, 
0.972, 0.972, 0.971, 0.971, 0.971, 0.971, 0.971, 0.971, 0.97, 
0.97, 0.97, 0.97, 0.97, 0.969, 0.969, 0.969, 0.969, 0.969, 0.968, 
0.968, 0.968, 0.968, 0.968, 0.967, 0.967, 0.967, 0.967, 0.967, 
0.966, 0.966, 0.966, 0.966, 0.966, 0.965, 0.965, 0.965, 0.965, 
0.965, 0.964, 0.964, 0.964, 0.964, 0.963, 0.963, 0.963, 0.963, 
0.963, 0.962, 0.962, 0.962, 0.962, 0.961, 0.961, 0.961, 0.961, 
0.961, 0.96, 0.96, 0.96, 0.96, 0.959, 0.959, 0.959, 0.959, 0.958, 
0.958, 0.958, 0.958, 0.957, 0.957, 0.957, 0.957, 0.957, 0.956, 
0.956, 0.956, 0.956, 0.955, 0.955, 0.955, 0.955, 0.954, 0.954, 
0.954, 0.954, 0.953, 0.953, 0.953, 0.952), n_fjernet = c(0, 0.1, 
0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1, 1.1, 1.2, 1.3, 1.4, 
1.5, 1.6, 1.7, 1.8, 1.9, 2, 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 
2.8, 2.9, 3, 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 3.9, 4, 
4.1, 4.2, 4.3, 4.4, 4.5, 4.6, 4.7, 4.8, 4.9, 5, 5.1, 5.2, 5.3, 
5.4, 5.5, 5.6, 5.7, 5.8, 5.9, 6, 6.1, 6.2, 6.3, 6.4, 6.5, 6.6, 
6.7, 6.8, 6.9, 7, 7.1, 7.2, 7.3, 7.4, 7.5, 7.6, 7.7, 7.8, 7.9, 
8, 8.1, 8.2, 8.3, 8.4, 8.5, 8.6, 8.7, 8.8, 8.9, 9, 9.1, 9.2, 
9.3, 9.4, 9.5, 9.6, 9.7, 9.8, 9.9)), row.names = c(NA, -100L), class = c("data.table", 
"data.frame"))

Solution

  • nd %>% 
      pivot_longer(values_to="N", names_to="Variable", cols=c(y_fem:y_et)) %>% 
      ggplot(aes(x=n_fjernet, y=N, fill=Variable)) + geom_area()
    

    Gives enter image description here

    It's just a question of making your data tidy in the context of your objective. Here, your data isn't tidy because your column names contain information.

    In response to OP's comment below...

    Ah! That makes things slightly trickier. The default position in a geom_area is stack, which means that the height of each coloured area is the height of the corresponding variable (and the total height of the stack is the sum of the individual values - for example, at n_fjernet = 0, you havey_fem = 0.981,y_tre = 0.9199andy_et = 0.514, giving a total stack height of about2.5`. Looking at your original graphs, you want to plot each line at its raw value, and fill the gap between that and its next lowext companion, right?

    In principle, that's easy. You can just set position=position_identity() in your geom_area(). But if that's going to work the way you want it to, you need to track the order of the values of your variables manually. For example, with your data, we get:

    nd %>% 
      pivot_longer(values_to="N", names_to="Variable", cols=c(y_fem:y_et)) %>% 
      ggplot(aes(x=n_fjernet, y=N, fill=Variable)) + 
      geom_area(position=position_identity())
    

    enter image description here

    Not at all what you want.

    One really hacky way of getting the right result in this particular instance is

    nd %>% 
      rename(X3=y_et, X2=y_tre, X1=y_fem) %>% 
      pivot_longer(values_to="N", names_to="Variable", cols=c(X1:X3)) %>% 
      ggplot(aes(x=n_fjernet, y=N, fill=Variable)) +   
      geom_area(position=position_identity())
    

    enter image description here

    You can also control the order in which the areas are plotted by customising the scale used to create the fills, as described here.

    Another option would be to use geom_ribbon rather than geom_area. But whichever method you use, I don't know how you can do it without manually controlling the order in which the fills are created. That seems to be an inevitable consequence of wanting values plotted in their absolute position AND filling the area beneath. The only posibility I can think of would be to set an alpha value of less than 1 for each fill. But, personally, I think that looks ugly:

    nd %>% 
      pivot_longer(values_to="N", names_to="Variable", cols=c(y_fem:y_et)) %>% 
      ggplot(aes(x=n_fjernet, y=N, fill=Variable)) + 
      geom_area(position=position_identity(), alpha=0.4)
    

    enter image description here

    And what will you do if the order of the variables changes as you move along the x-axis? Personally, I'd drop the fill and just use different coloured lines. But it's your call.

    If anyone has a better option, I'd be interested to see it.

    * Edit 2 * To answer OP's question about manual control of colours:

    nd %>% 
      rename(X3=y_et, X2=y_tre, X1=y_fem) %>% 
      pivot_longer(values_to="N", names_to="Variable", cols=c(X1:X3)) %>% 
      ggplot(aes(x=n_fjernet, y=N, fill=Variable, colour=Variable)) + 
      geom_area(position=position_identity()) +
      geom_line() +
      scale_fill_manual(values=c("pink", "darkseagreen2", "steelblue2")) +
      scale_colour_manual(values=c("red", "green4", "blue"))
    

    gives me

    enter image description here

    My code is pretty similar to yours as far as I can see, so I'm not sure why it works for me and not for you. [Did you remember to put colour=Variable inside aes()?]

    I get my colours from here.

    You mentioned geom_point in your comment. Was that a typo?

    By the way, we didn't need all 200 data points to solve this. Half a dozen would have been enough. A dozen at most. Maybe next time... ;)