Search code examples
rggplot2geom-ribbon

plotting proportions with geom_ribbon


I'm trying to show how changes in the proportions of different age classes have changed through time. Presently, for all the data, I have the proportion of individuals at a given age (pAge), and cumulative sums of the proportions that provide the min/max values for each age (cpAgeMin and cpAgeMax) that I'm using in geom_ribbon.

Unfortunately, even though all the proportions sum to 1, my plot doesn't seem to properly reflect the data as there are open spots without data in several years. These don't seem to be issues with the data, but rather something ggplot is doing. How do I fix this so that the entire area from 0-1 in every year is assigned to one age class.

# subset of data
df <- structure(list(site = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
                                    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
                                    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), 
                                  levels = "G", class = "factor"), 
                 year = c(1988, 1988, 1989, 1989, 1989, 1990, 1990, 1991, 1991, 1991, 
                          1992, 1992, 1992, 1993, 1993, 1993, 1998, 1998, 1998, 1999, 
                          1999, 1999, 2000, 2000, 2000, 2001, 2001, 2001, 2002, 2002, 
                          2003, 2003, 2003), 
                 age = c(`9104` = 1, `9341` = 2, `9292` = 1, `9632` = 2, `9960` = 4, 
                         `9543` = 1, `9857` = 3, `9685` = 1, `9968` = 2, `10169` = 3, 
                         `9858` = 1, `10127` = 2, `10212` = 3, `10009` = 1, `10284` = 2, 
                         `10522` = 6, `10464` = 1, `10605` = 2, `10830` = 3, `10556` = 1, 
                         `10744` = 2, `11056` = 3, `10687` = 1, `11061` = 2, `11279` = 5, 
                         `10912` = 1, `11109` = 2, `11197` = 3, `11065` = 1, `11194` = 2, 
                         `11154` = 1, `11255` = 2, `11324` = 3), 
                 pAGE = c(0.972, 0.028, 0.964, 0.005, 0.031, 0.823, 0.177, 0.921, 0.074, 
                          0.004, 0.846, 0.045, 0.109, 0.833, 0.155, 0.012, 0.927, 0.054, 
                          0.019, 0.784, 0.21, 0.005, 0.958, 0.017, 0.025, 0.852, 0.124, 
                          0.024, 0.913, 0.087, 0.909, 0.073, 0.019), 
                 cpAgeMin = c(0, 0.972, 0, 0.964, 0.969, 0, 0.823, 0, 0.921, 0.996, 0, 
                              0.846, 0.891, 0, 0.833, 0.988, 0, 0.927, 0.981, 0, 0.784, 
                              0.995, 0, 0.958, 0.975, 0, 0.852, 0.976, 0, 0.913, 0, 0.909, 
                              0.981), 
                 cpAgeMax = c(0.972, 1, 0.964, 0.969, 1, 0.823, 1, 0.921, 0.996, 1, 0.846, 
                              0.891, 1, 0.833, 0.988, 1, 0.927, 0.981, 1, 0.784, 0.995, 1, 
                              0.958, 0.975, 1, 0.852, 0.976, 1, 0.913, 1, 0.909, 0.981, 1),
                 seg = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 
                         2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2)), 
            row.names = c(NA, -33L), class = c("tbl_df", "tbl", "data.frame"))

# plot
ggplot(data = df) + 
  geom_ribbon(aes(x = year, ymin = cpAgeMin, ymax = cpAgeMax, group = interaction(seg, factor(age)), fill = factor(age), col = factor(age))) +
  theme_bw()

enter image description here

UPDATE:

After implementing @Andy Baxter's solution to a second site, there are extra "bumps" outside of the range of the data that appear in the plot. How do I get rid of these?

enter image description here


Solution

  • Stacking it with geom_area and position = "fill" might be smoothest:

    library(tidyverse)
    
    
    df |> 
      ggplot() +
      geom_area(
        aes(
        x = year,
        y = pAGE,
        group = interaction(seg, factor(age)),
        fill = factor(age)
        ),
        position = position_fill(reverse = TRUE)  # to get right order
      )