Search code examples
rggplot2axis-labels

How do I show a group variable along with the numeric mid point on the axis of my plot?


I have some data, of which this is a subset:

MyDataToSO <- data.frame(Age = c(2, 7, 12, 16, 21),
                     AgeGroup = c("0-4 years", "5-9 years", "10-14 years", "15-17 years", "18-24 years"),
                     Proportion = c(0.963, 0.965, 0.925, 0.701, 0.422))

I wish to plot the data so that, on the x-axis, I get the relevant AgeGroup showing under the Age tick mark. The Age values are the mid-points of the AgeGroup categories.

I have the plot I want, except for adding in the AgeGroup bands under the relevant parts of the x-axis:

ggplot(data = MyDataToSO, aes(x = Age, y = Proportion)) +
geom_point() +
geom_point(data = subset(MyDataToSO, Age %in% c(16,21)), color = "green")
scale_x_continuous(breaks=seq(0, 30, by = 10)) +
labs(x = "Age group", y = "Proportion")

The graph works, showing the relevant Age in the correct position, but there is no indication that the Age values arise from age-groups.

I thought it would be useful to show this by having a second label on the x-axis, so that the resulting x-axis looks a bit like:

|
|______________________________...
      |         |         |    ...
      2         7         12   ...
|__________|_________|_________|...
 "0-4 years  5-9 years  10-14 years"...

I will need to play around with the font size a bit to get this working. I'd also like to get the age groups lines lighter than the normal printing (e.g. 25% less opaque than normal). I've put the quote marks around the age group labels to stop SO from showing each number there as orange numeric.

How can I add this information onto my graph? I did a search for secondary labels, but only found questions relating to having a secondary axis. As you can see, the required grouping information is stored in AgeGroup so I would "just" need to extract the relevant values from there.

Edit: I loaded the ggh4x package and the ggplot code is now this:

ggplot(data = MyDataToSO, aes(interaction(Age, AgeGroup), Proportion)) +
geom_point() +
geom_point(data = subset(MyDataToSO, Age %in% c(16,21)), color = "green")
scale_x_continuous(breaks=seq(0, 30, by = 10)) +
guides(x = "axis_nested") +
labs(x = "Age group", y = "Proportion")

but it is giving an error because the x-axis is continuous.

Edit 2: the green points are interpolations. I now have interpolations for ages 17 through 20. But these repeat the same AgeGroup label. Is this a problem?


Solution

  • Another approach would be to add annotations, turn off clipping, and put in more space between the axis text and axis titles, like so:

    ggplot(data = MyDataToSO, aes(x = Age, y = Proportion)) +
      geom_point() +
      geom_point(data = subset(MyDataToSO, Age %in% c(16,21)), color = "green") +
    scale_x_continuous(breaks=seq(0, 30, by = 10)) +
      labs(x = "Age group", y = "Proportion") +
      annotate("rect", fill = "gray80",
               xmin = c(0, 5, 10, 15, 18),
               xmax = c(5, 10, 15, 18, 24) - 0.2,
               ymin = 0.28, ymax = 0.32) +
      annotate("text", size = 3,
               x = MyDataToSO$Age + 0.5,
               y = 0.3, label = MyDataToSO$AgeGroup) +
      coord_cartesian(ylim = c(0.4, 1), clip = "off") +
      theme(axis.title.x = element_text(margin = margin(t = 25, r = 0, b = 0, l = 0)))
    

    enter image description here

    Edit: Based on my understanding of additional comment, now splitting out 15:21 individually.

    MyDataToSO <- data.frame(Age = c(2, 7, 12, 15:21),
                             AgeGroup = c("0-4 years", "5-9 years", "10-14 years", 15:21),
                             Proportion = c(0.963, 0.965, 0.925, 0.701, .740, .677, .610, .540, .470, .401))
    
    
    ggplot(data = MyDataToSO, aes(x = Age, y = Proportion)) +
      geom_point() +
      geom_point(data = subset(MyDataToSO, Age %in% c(16,21)), color = "green") +
      scale_x_continuous(breaks=seq(0, 30, by = 10)) +
      labs(x = "Age group", y = "Proportion") +
      annotate("rect", fill = "gray80",
               xmin = c(0, 5, 10, 15:21) - 0.4,
               xmax = c(5, 10, 15, 16:22) - 0.6,
               ymin = 0.28, ymax = 0.32) +
      annotate("text", size = 3,
               x = MyDataToSO$Age,
               y = 0.3, label = MyDataToSO$AgeGroup) +
      coord_cartesian(ylim = c(0.4, 1), clip = "off") +
      theme(axis.title.x = element_text(margin = margin(t = 25, r = 0, b = 0, l = 0)))
    

    enter image description here