Search code examples
rggplot2percentagegeom-bar

How to change count to percentage in geom_bar when combining with other plot (ggplot2)?


I am trying to plot two graphics in one using ggplot2. I have two data frames that share a common variable (factor). They look like this:

tb1:

study       caribbean  south.america
alison_2010 1          0
james_1998  0          1
...

tb2:

study       stage.I stage.II stage III ...
alison_2010 95.6    93.1     81.3
james_1998  94.2    80.7     74.5
...

I would like to plot one graph with both information (results as shown in tb2 and the region of origin as shown in tb1). Tb1 would be ploted as a bar plot (to create retangles on the backgound), and tb2 as a dot plot.

I tried this:

tb1<- melt(tb1, id.vars="study")
tb2<- melt(tb2, id.vars="study")

c<-ggplot()+
    geom_bar(data=tb1, aes(y=tb1$value, x=tb1$study), fill=tb1$variable),
         stat="identity", position_fill(reverse = TRUE))+
    geom_dotplot(data=tb2, aes(x=tb2$study, y=tb2$value, color=tb2$variable, fill=tb2$variable),
         binaxis="y", stackdir="center", binwidth=1, dotsize=1.5, group=1)

I get this:

enter image description here

When I add + scale_y_continuous(labels=scales::percent)

I get this:

enter image description here

I tried to use 100 insted of 1 on tb1, or to divide the values in tb2 by 100. Didn't work ;/

I am not worried about the labels or anything at the moment. I just want the bar chart to plot percentage, not counts. Can anyone help me? Thank you!


Solution

  • It seems unusual that you want box plots behind your points, but that your boxes are all zero-or-one. I have taken two approaches to answering your question.

    First setting up the data:

     tb1 = data.frame(study = c("alison_2010", "james_1998"),
                      caribbean = c(1,0),
                      south.america = c(0,1))
     tb2 = data.frame(study = c("alison_2010", "james_1998"),
                      stage1 = c(95,94),
                      stage2 = c(93,80),
                      stage3 = c(81,74))
    
    tb1a = melt(tb1, id.vars = "study")
    tb2a = melt(tb2, id.vars = "study")
    tba = inner_join(tb1a, tb2a, by = "study") %>% filter(value.x == 1)
    

    Approach 1: assuming you only want to plot Alison in Caribbean and James in South America, bar plots behind:

    ggplot(data = tba) +
           geom_col(aes(x=interaction(study,variable.y),  y=value.x, fill=variable.x)) +
           geom_point(aes(x=interaction(study,variable.y), y=value.y/100, color = variable.y), size = 5) +
           scale_colour_manual(values = c("purple","orange","grey"))
    

    Result from approach 1

    Notes:

    • If you also wanted plots when Caribbean = 0 then you will need to remove the filter.

    • The bar plot uses 'fill' and the points use 'color', otherwise it is difficult to get separate colors on them both.

    Approach 2: Anticipating that faceting would be a simpler solution to your question:

    ggplot(data=tba) +
           geom_point(aes(x=study, y=value.y, color = variable.y), size = 5) +
           facet_grid(.~variable.x)
    

    Result from approach 2