Search code examples
rggplot2subplotstandard-deviationinsets

add plot inset (subplot) and multiple standard deviations to ggplot() with filtering based on conditions in R


Part 1: We are trying to make a ggplot() that shows multiple different summary statistics depending on the x-axis. We have more or less managed to do this, but are open to suggestions to help with efficiency.

  1. For Time 0: have error bars bounded from 6 g to 20 g
  2. For Time 1 to 8: mean +/- 1 standard deviation (SD)
  3. For Time 9 to 12: mean +/- 2 SD
  4. For Time >13: no error bars

Part 2: The data we are actually working on has Times up to 3000. Because of this, we would like to include a inset (subplot) in the upper left hand side of the plot for Time>=6, Time<=10.

Part 3: In addition to both things above, we want to then remove any data points that are outside the error bars (so that we can show a "before" (with all the points, including those outside the bounds of the error bars) and "after" plot (only points inside the bounds of the error bars for up to and including Time==12)).

For the purposes of reproducibility, I am using a dataset from R to illustrate my question. The dataset:

library(datasets)
data(ChickWeight) #importing data from base R
summary(ChickWeight)

 weight           Time           Chick         Diet   
 Min.   : 35.0   Min.   : 0.00   13     : 12   1:220  
 1st Qu.: 63.0   1st Qu.: 4.00   9      : 12   2:120  
 Median :103.0   Median :10.00   20     : 12   3:120  
 Mean   :121.8   Mean   :10.72   10     : 12   4:118  
 3rd Qu.:163.8   3rd Qu.:16.00   17     : 12          
 Max.   :373.0   Max.   :21.00   19     : 12          
                                 (Other):506 

I have been able to do some of this by first creating vectors (one for ymin and ymax in geom_errorbar). The code to do this is at the bottom of the question. We are open to suggestions on how to do this more efficiently.

We then try to put all this together in one ggplot() like so (excluding unneeded formatting):

#Import required package: 
library(ggplot2)
    
ggplot(merge_stats, aes(y = weight, x = as.numeric(Time))) +
    geom_jitter(color="grey", width=0.1)+
    geom_errorbar(aes(ymin=SDbelow, ymax=SDabove), width=0.1, size=1)+
    stat_summary(
      geom = "point",
      fun.y = "mean",
      col = "blue",
      size = 2,
      shape = 19,
      fill = "blue")

This generates:

enter image description here

How do we add a subplot() to this ggplot()? AND How do we then go on to delete the points outside the standard deviation parameters set above?

So, the final product should be two figures, one plot and subplot with all the data points and another plot and subplot without the deleted points.


Code to make the different standard deviations for each age group. Note, we are open to suggestions about streamlining this.

#loading required package
library(dplyr)
library(pracma)

#Creates a table that includes the SD of each age and the mean of each age 
merge_stats <- ChickWeight %>% 
    arrange(Time) %>% 
    group_by(Time) %>%
    mutate(MEAN=mean(weight), SD = sd(weight), SDt=2*sd(weight)) #add to data 

#Vector for Time==0:      
merge_stats_age_zero <- merge_stats %>%  
    filter(Time==0)
vl <-  length(merge_stats_age_zero$weight)
MSZUL=linspace(20, 20, vl) #Vector for top bound 
MSZLL=linspace(6, 6, vl)   #Vector for bottom bound 

#Vector for Time>=1, Time<=8:
mergesaot <- merge_stats %>%                  
    filter(Time>=1, Time<=8)

#vectors for +/- 1 SD for Time>=1, Time<=8:
otoerr = mergesaot$MEAN+mergesaot$SD
otberr = mergesaot$MEAN-mergesaot$SD

#Vector for Time>8, Time<=12:
mergesef <- merge_stats %>%                   
    filter(Time>8) %>%
    filter(Time<=12)

#vectors for +/- 2 SD for Time>8, Time<=12:
efoerr <- mergesef$MEAN+mergesef$SDt
efberr <- mergesef$MEAN-mergesef$SDt

#Combining vectors together:
LSDabove <- c(MSZUL ,otoerr, efoerr)
LSDbelow <- c(MSZLL ,otberr, efberr)

#To generate the final vector we need to first find its length. This is done by subtracting the length of the total by the three added together.
m_swt <- c(merge_stats$SD)
finpeice <- length(m_swt) - length(LSDabove)

#Knowing the length we will generate a vector of zeros to represent no error bars and to cover the remaining length of our errorbar vectors 
finpeiceVec <- linspace(0, 0, finpeice) 

#Finaly we have generated our two vectors to represent our error bars
SDabove <- c(MSZUL ,otoerr, efoerr, finpeiceVec)
SDbelow <- c(MSZLL ,otberr, efberr, finpeiceVec)

Solution

  • This is really two questions. The one about removing points outside the error bars really just comes down to filtering your data after the summary stats are created on the main data set. If you struggle with this then a dedicated question might be better. I will show here how to inset a subplot using grid, on which ggplot2 is built:

    subset1 <- which(merge_stats$Time >= 6 & merge_stats$Time <= 10)
    
    p1 <- ggplot(merge_stats[subset1, ], 
                 aes(y = weight, x = as.numeric(Time))) +
      geom_jitter(color="grey", width=0.1)+
      geom_errorbar(aes(ymin=SDbelow[subset1], ymax=SDabove[subset1]), width=0.1, size=1)+
      stat_summary(
        geom = "point",
        fun.y = "mean",
        col = "blue",
        size = 2,
        shape = 19,
        fill = "blue")
    
    inset <- ggplotGrob(p1)
    
    ggplot(merge_stats, aes(y = weight, x = as.numeric(Time))) +
      geom_jitter(color="grey", width=0.1)+
      geom_errorbar(aes(ymin=SDbelow, ymax=SDabove), width=0.1, size=1)+
      stat_summary(
        geom = "point",
        fun.y = "mean",
        col = "blue",
        size = 2,
        shape = 19,
        fill = "blue")
    
    vp <- grid::viewport(width = 0.4, height = 0.4, x = 0.3, y = 0.7)
    
    print(p1, vp = vp)
    

    enter image description here