Search code examples
raggregateline-plot

Using the "aggregate" function for drawing line plots


I am trying to draw a line plot with error bars for two groups of data (Treatment vs. Control). There are totally 20 periods, 10 Trial Periods (TP) and 10 formal Periods (P) and I want to show how the group means change over time. For simplicity, the following dataframe includes 3 Trial Periods (TP1, TP5, TP10) and 3 formal Periods (P1, P5, P10).

Below is my code. My problem is that the “aggregate” function changes the order of the periods by resorting them as strings, which messes up the time trend—I want them to be ordered as TP1->TP5->TP10->P1->P5->P10

I suppose this is not too tricky, but I’m just stuck. I’d appreciate it if someone could tell me how to solve this problem.

Also: as there are 20 periods in total, it might look better to draw error bands (or CI bands) instead of numerous error bars. Is there a way to do this?

df <- data.frame(Condition=c(rep("Treatment", 10), rep("Control", 10)), 
               TP1=rnorm(20, 1, 1), TP5=rnorm(20, 5, 1), TP10=rnorm(20, 10, 1), 
               P1=rnorm(20, 1, 1), P5=rnorm(20, 5, 1), P10=rnorm(20, 10, 1))

temp <- tidyr::gather(df, Period, x, -Condition)

m <- aggregate(x~Period + Condition, temp, mean)

st.err <- function(x) sqrt(var(x)/length(x))

se <- aggregate(x~Period + Condition, temp, st.err)

ci.data <- cbind(m, se[, 3])

colnames(ci.data) <-  c("Period", "Condition", "Mean", "SE")

library(ggplot2)
ggplot(data=ci.data, aes(x=Period, y=Mean, group=Condition, color=Condition)) + 
  geom_line() + 
  geom_point() +  
  geom_errorbar(aes(ymin=Mean-SE, ymax=Mean + SE), 
                width=.1, position=position_dodge(0.05)) 

Solution

  • Using reshape2::melt() converts the periods into factors which are better sorting. There is no need for more aggregation here, since geom_smooth() is doing what you want.

    df.long <- reshape2::melt(df, "Condition", variable.name="Period", value.name = "x")
    
    library(ggplot2)
    ggplot(df.long, aes(x=Period, y=x, group=Condition, color=Condition, fill=Condition)) + 
      geom_smooth(method="loess", level=0.95, alpha=.2) 
    

    Yielding

    enter image description here

    Data

    set.seed(42)  # for sake of reproducibility
    df <- data.frame(Condition=c(rep("Treatment", 10), rep("Control", 10)), 
                   TP1=rnorm(20, 1, 1), TP5=rnorm(20, 5, 1), TP10=rnorm(20, 10, 1), 
                   P1=rnorm(20, 1, 1), P5=rnorm(20, 5, 1), P10=rnorm(20, 10, 1))
    

    Edit

    For your larger dataset you could adjust the smoothing with span. Export the plot e.g. with png() to your working directory.

    png("test.png", width=1080, height=720, res=100)
    ggplot(df.long, aes(x=Period, y=x, group=Condition, color=Condition, fill=Condition)) + 
      geom_smooth(method="loess", level=0.95, alpha=.2, span=.2) +
      labs(title="My Plot")
    dev.off()
    

    Yielding

    enter image description here