Search code examples
rplotboxplotmultiple-axes

formatting data to display mulitple boxplots in R, also creating double y-axis in R


I have a data set that has observations of two variables (incidence (0-100) and severity (0-5) across 5 years. It looks something like this.

cbb.incidence   avg.severity    Year
1   86.666667   2.0333333   2009
2   83.333333   1.8666667   2009
3   20.000000   1.2000000   2009
4   26.666667   1.2666667   2010
5   86.666667   1.9000000   2010
6   86.666667   1.8666667   2010
7   86.666667   2.0333333   2011
8   83.333333   1.8666667   2011
9   20.000000   1.2000000   2012
10  26.666667   1.2666667   2012
11  86.666667   1.9000000   2013
12  86.666667   1.8666667   2013

What I want to get is a figure with two box-plots for each year, one of each variable. I found my exact same question here on stack overflow: Plot multiple boxplot in one graph

So I "melt" the data as they describe in the example, and then plot it as decribed:

meltedData<-melt(incidence_all, id.var='Year')
ggplot(data=meltedData, aes(x=Year, y=value)) +
geom_boxplot(aes(fill=variable))

The data appears to be in the correct format The melted data looks like this (this is a subset, there are >2000 rows):

     Year  variable       value
1017 2009  avg.severity   1.5333333
1018 2009  avg.severity   2.1333333
1019 2009  avg.severity   2.0666667
1020 2009  avg.severity   2.0000000
1021 2009  avg.severity   2.0666667
1022 2009  avg.severity   1.6333333
1023 2009  avg.severity   1.5666667
1024 2009  cbb.incidence  16.777775
1025 2009  cbb.incidence  35.888865

see code above

Will one you R-wizards please tell me what I'm doing wrong?

ALSO, I know already that my two variables are on very different scales (incidence is from 0-100, and severity is 1-5) so if I simply plot both with the same y-axis scale the smaller values will be un-readable. I would like have a double y-axis, one on the left and one on the right, with each variable being scaled to a different y-axis. I have not seen a box-plot example with this feature. Can someone make a recommendation of how to approach this, preferably in ggplot?

THANK YOU!!


Solution

  • Try making Year as factor first:

    incidence_all$Year=factor(incidence_all$Year)
    
    meltedData<-melt(incidence_all, id.var="Year")
    ggplot(data=meltedData, aes(x=Year, y=value)) +
      geom_boxplot(aes(fill=variable))
    

    You will get something like this: enter image description here

    For the second question, one alternative would be to rescale:

    incidence_all$avg.severitys=incidence_all$avg.severity*100/max(incidence_all$avg.severity)
    
    meltedData<-melt(incidence_all[,-2], id.var="Year")
    ggplot(data=meltedData, aes(x=Year, y=value)) +
      geom_boxplot(aes(fill=variable))
    

    enter image description here