I am trying to plot trends in the age of university applicants. From various databases I use the data to build the following dataframe:
> AgeGroup <- c("Year", "17","18","19","20", "21", "22", "23", "24", "25to29", "30to39", "40plus"); AgeGroup
[1] "Year" "17" "18" "19" "20" "21" "22" "23" "24"
[10] "25to29" "30to39" "40plus"
> AGEgroups <- as.data.frame(cbind(a,h,i,j, k, l, m, n, o, p, q, r)); AGEgroups
a h i j k l m n o p q r
1 2004 1053 160450 74600 25778 14317 9761 6995 5589 15902 17171 8351
2 2005 1115 175406 77751 28368 15191 10551 7778 6107 18153 18695 9686
...
9 2012 743 199213 93669 37214 21240 14651 10962 8781 26387 27246 15308
10 2013 702 201821 103356 39185 21557 15242 11226 8707 27326 26887 15442
> colnames(AGEgroups) <- AgeGroup
> AGEgroups
Year 17 18 19 20 21 22 23 24 25to29 30to39 40plus
1 2004 1053 160450 74600 25778 14317 9761 6995 5589 15902 17171 8351
...
10 2013 702 201821 103356 39185 21557 15242 11226 8707 27326 26887 15442
Then I plot the graph using the ggplot2 library:
> ggplot(AGEgroups,aes(x=Year, y=NumerOfApplicants, fill=Age.Range)) +
+ geom_area(data = AGEgroups, aes(x=Year, y=h, fill="17 yrs"))+
+ geom_area(data = AGEgroups, aes(x=Year, y=i, fill="18 yrs"))+
+ geom_area(data = AGEgroups, aes(x=Year, y=j, fill="19 yrs"))+
...
And receive a graph, which generally looks ok (though I tried to customise the colours and failed and though you cannot see it as I do not have enough reputation points), but... only 5 age groups get plotted instead of 11...
When I try to plot them separately using:
ggplot(AGEgroups,aes(x=Year, y=NumerOfApplicants, fill=Age.Range)) +
geom_area(data = AGEgroups, aes(x=Year, y=l, fill="21 yrs"))
the majority work fine, but then when I plot:
ggplot(AGEgroups,aes(x=Year, y=NumerOfApplicants, fill=Age.Range)) +
geom_area(data = AGEgroups, aes(x=Year, y=m, fill="22 yrs"))
which is the missing group, I get the error message:
Error: unexpected numeric constant in:
"ggplot(AGEgroups,aes(x=Year, y=NumerOfApplicants, fill=Age.Range)) +
geom_area(data = AGEgroups, aes(x=Year, y=m, fill="22"
I have been looking at both code lines and can see no difference in the syntax. the 'm' vector gets displayed on command. Any ideas why it might be happening?
I do not get the unexpected numeric constant error today after restarting the computer, which means the old "switch on/off" technique solves at least 50% of problems;)
Still, the graph displays 5 instead of 11 variables. The suggested dput(head(AGEgroups)) yields the following output:
structure(list(Year = 2004:2009, `17` = c(1053L, 1115L, 937L,
1023L, 1273L, 1236L), `18` = c(160450L, 175406L, 173806L, 176306L,
187802L, 197090L), `19` = c(74600L, 77751L, 71285L, 83706L, 89462L,
97544L), `20` = c(25778L, 28368L, 27003L, 29955L, 36255L, 38451L
), `21` = c(14317L, 15191L, 15464L, 16550L, 19745L, 22110L),
`22` = c(9761L, 10551L, 10287L, 11498L, 13384L, 15132L),
`23` = c(6995L, 7778L, 7664L, 8054L, 9801L, 11080L), `24` = c(5589L,
6107L, 5948L, 6150L, 7470L, 8810L), `25to29` = c(15902L,
18153L, 18001L, 18833L, 23578L, 27299L), `30to39` = c(17171L,
18695L, 17818L, 17861L, 22643L, 26781L), `40plus` = c(8351L,
9686L, 9854L, 10141L, 13183L, 15888L)), .Names = c("Year",
"17", "18", "19", "20", "21", "22", "23", "24", "25to29", "30to39",
"40plus"), row.names = c(NA, 6L), class = "data.frame")
I still can't get your code above to run because it's missing all the single-letter variables and I don't want to define those manually so I can't reproduce the error.
But a better way to plot your data would be to melt it first.
library(reshape2)
mm<-melt(AGEgroups, id.vars="Year")
then plot with
ggplot(mm,aes(x=Year, y=value, fill=variable)) +
geom_area() + ylab("Number of Applicants") +
scale_fill_hue(name = "Age Range",
labels=c(paste(17:24, "yrs"),"25 to 29", "30 to 39", "40+"))
which produces
Here we clearly label the plot using the more standard assignments rather than relying on the side effects of using imaginary variables in the aesthetics. This make this intention of the code much clearer.