I have some data, of which this is a subset:
MyDataToSO <- data.frame(Age = c(2, 7, 12, 16, 21),
AgeGroup = c("0-4 years", "5-9 years", "10-14 years", "15-17 years", "18-24 years"),
Proportion = c(0.963, 0.965, 0.925, 0.701, 0.422))
I wish to plot the data so that, on the x-axis, I get the relevant AgeGroup
showing under the Age
tick mark. The Age
values are the mid-points of the AgeGroup
categories.
I have the plot I want, except for adding in the AgeGroup
bands under the relevant parts of the x-axis:
ggplot(data = MyDataToSO, aes(x = Age, y = Proportion)) +
geom_point() +
geom_point(data = subset(MyDataToSO, Age %in% c(16,21)), color = "green")
scale_x_continuous(breaks=seq(0, 30, by = 10)) +
labs(x = "Age group", y = "Proportion")
The graph works, showing the relevant Age
in the correct position, but there is no indication that the Age
values arise from age-groups.
I thought it would be useful to show this by having a second label on the x-axis, so that the resulting x-axis looks a bit like:
|
|______________________________...
| | | ...
2 7 12 ...
|__________|_________|_________|...
"0-4 years 5-9 years 10-14 years"...
I will need to play around with the font size a bit to get this working. I'd also like to get the age groups lines lighter than the normal printing (e.g. 25% less opaque than normal). I've put the quote marks around the age group labels to stop SO from showing each number there as orange numeric.
How can I add this information onto my graph? I did a search for secondary labels, but only found questions relating to having a secondary axis. As you can see, the required grouping information is stored in AgeGroup
so I would "just" need to extract the relevant values from there.
Edit: I loaded the ggh4x
package and the ggplot
code is now this:
ggplot(data = MyDataToSO, aes(interaction(Age, AgeGroup), Proportion)) +
geom_point() +
geom_point(data = subset(MyDataToSO, Age %in% c(16,21)), color = "green")
scale_x_continuous(breaks=seq(0, 30, by = 10)) +
guides(x = "axis_nested") +
labs(x = "Age group", y = "Proportion")
but it is giving an error because the x-axis is continuous.
Edit 2: the green points are interpolations. I now have interpolations for ages 17 through 20. But these repeat the same AgeGroup
label. Is this a problem?
Another approach would be to add annotations, turn off clipping, and put in more space between the axis text and axis titles, like so:
ggplot(data = MyDataToSO, aes(x = Age, y = Proportion)) +
geom_point() +
geom_point(data = subset(MyDataToSO, Age %in% c(16,21)), color = "green") +
scale_x_continuous(breaks=seq(0, 30, by = 10)) +
labs(x = "Age group", y = "Proportion") +
annotate("rect", fill = "gray80",
xmin = c(0, 5, 10, 15, 18),
xmax = c(5, 10, 15, 18, 24) - 0.2,
ymin = 0.28, ymax = 0.32) +
annotate("text", size = 3,
x = MyDataToSO$Age + 0.5,
y = 0.3, label = MyDataToSO$AgeGroup) +
coord_cartesian(ylim = c(0.4, 1), clip = "off") +
theme(axis.title.x = element_text(margin = margin(t = 25, r = 0, b = 0, l = 0)))
Edit: Based on my understanding of additional comment, now splitting out 15:21 individually.
MyDataToSO <- data.frame(Age = c(2, 7, 12, 15:21),
AgeGroup = c("0-4 years", "5-9 years", "10-14 years", 15:21),
Proportion = c(0.963, 0.965, 0.925, 0.701, .740, .677, .610, .540, .470, .401))
ggplot(data = MyDataToSO, aes(x = Age, y = Proportion)) +
geom_point() +
geom_point(data = subset(MyDataToSO, Age %in% c(16,21)), color = "green") +
scale_x_continuous(breaks=seq(0, 30, by = 10)) +
labs(x = "Age group", y = "Proportion") +
annotate("rect", fill = "gray80",
xmin = c(0, 5, 10, 15:21) - 0.4,
xmax = c(5, 10, 15, 16:22) - 0.6,
ymin = 0.28, ymax = 0.32) +
annotate("text", size = 3,
x = MyDataToSO$Age,
y = 0.3, label = MyDataToSO$AgeGroup) +
coord_cartesian(ylim = c(0.4, 1), clip = "off") +
theme(axis.title.x = element_text(margin = margin(t = 25, r = 0, b = 0, l = 0)))