Search code examples
rggplot2reshape2ggforce

Split groups in bar plot


I have a dataframe with values corresponding to two separate groups evaluated over time. Mock data below:

Gene Name. Sample S1. Sample S2. Sample S3. Sample R1. Sample R2. Sample R3.
Gene 1         4          5          3          3          39        44
Gene 2         4         100        33          3          32        14

I melted my dataframe and compiled summary stats using the summarySE function. I then plotted my data using the following script:

plot = ggplot(tgastats2, aes(x=Gene Name, y=value, fill=Sample)) 
  + geom_bar(position=position_dodge(), stat="identity") +
  + geom_errorbar(aes(ymin=value-se, ymax=value+se),
                  + width=.2,
                  + position=position_dodge(.9))

What I would like to do is plot the values of S1-3 grouped together and R1-3 on the same plot separated with some space. Any help would be appreciated.


Solution

  • Here's the data in a reproducible way:

    df <- data.frame(
      Gene_name=c('Gene 1', 'Gene 2'),
      Sample.S1=c(4,4), Sample.S2=c(5,100), Sample.S3=c(3,33),
      Sample.R1=c(3,3), Sample.R2=c(39,32), Sample.R3=c(44,14)
    )
    

    Now, for a solution. As you indicated, we need to "melt" the dataset. My preference is to use gather() from dplyr, but melt() works in a similar manner:

    df1 <- df %>% gather(key='Sample', value='value', -Gene_name)
    

    In order for ggplot2 to know that you want to group it in the manner you indicate, you will need to categorize the data. R and ggplot are not smart enough to understand S1, S2, and S3 belong together, so you have to tell R how that can be done. There are likely a lot of ways to separate and categorize. Without seeing your actual melted df, tgastats2, I'll have to assume it's similar to the example posted. I'm going to use the fact that all samples R1-R3 contain a capital "R", whereas the others do not:

    df1$my_group <- ifelse(grepl('R',df1$Sample),'R','S')
    

    Then you can plot:

    ggplot(df1, aes(x=Gene_name, y=value, fill=my_group)) +
      geom_col(position='dodge', color='black')
    

    enter image description here

    Hm... that doesn't look right. What's going on? Well, ggplot is separating based on df1$my_group, but there are 3 values in each of those groups. You can separate those out by using the group= aesthetic in addition to the fill= aesthetic and ggplot will separate them out completely:

    ggplot(df1, aes(x=Gene_name, y=value, fill=my_group, group=Sample)) +
      geom_col(position='dodge', color='black')
    

    enter image description here