Search code examples
rggplot2time-seriesfactorial

Plot raw and predict values for 2x2x2 time-series


This is the sample of my data

library(tidyr)
library(dplyr)
library(ggplot2)

resource <- c("good","good","bad","bad","good","good","bad","bad","good","good","bad","bad","good","good","bad","bad")

fertilizer <- c("none", "nitrogen","none","nitrogen","none", "nitrogen","none","nitrogen","none", "nitrogen","none","nitrogen","none", "nitrogen","none","nitrogen")

t0 <-  sample(1:20, 16)
t1 <-  sample(1:20, 16) 
t2 <-  sample(1:20, 16)
t3 <-  sample(1:20, 16)
t4 <-  sample(1:20, 16)
t5 <-  sample(1:20, 16)
t6 <-  sample(10:100, 16)
t7 <-  sample(10:100, 16)
t8 <-  sample(10:100, 16)
t9 <-  sample(10:100, 16)
t10 <-  sample(10:100, 16)

replicates <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16)

data <- data.frame(resource, fertilizer,replicates, t0,t1,t2,t3,t4,t5,t6,t7,t8,t9,t10)

data$resource <- as.factor(data$resource)
data$fertilizer <- as.factor(data$fertilizer)

data.melt <- data %>% ungroup %>% gather(time, value, -replicates, -resource, -fertilizer)

data.melt$predict <- sample(1:200, 176)

Where, there are 2 factors for resources and fertilizer, so there are effectively 4 treatments and 4 x 4 = 16 replicates. Time is a factor with 10 levels. I ran a model, and predicted values which is in the predict column.

Now I want to plot a time-series with time on the x-axis and mean of the fitted value (predict) on and the raw values (value) on the y-axis, for each type of resource and fertilizer (4 treatments) [That is 4 plots]. I also want to add a confidence interval for the algal growth at each time point. Here is my attempt at the code.

ggplot(df, aes(x=time, y=predicted)) + geom_point(size=3)+ stat_summary(geom = "point", fun.y = "mean") + facet_grid(resource + fertilizer ~.) 

With this simple code, I still get only 2 graphs and not 4. Also, the means of the predict function are not plotted. I don't know how to plot the value and predicted together, and the corresponding confidence intervals.

It would be helpful if anyone could also show how all four treatments can be on a single plot, and if I can get it to facet (like above)


Solution

  • My proposed solution is to create a second data.frame containing all summary statistics such as mean predicted value. I show one way to do this with group_by and summarize from the dplyr package. The summary data needs to have columns resource, fertilizer and time that match the main data. The summary data also has columns with additional y values.

    Then, the main data and the summary data need to be provided separately to the appropriate ggplot functions, but not in the main ggplot() call. facet_grid can be used to split the data into four plots.

    # Convert time to factor, specifying correct order of time points.
    data.melt$time = factor(data.melt$time, levels=paste("t", seq(0, 10), sep=""))
    
    # Create an auxilliary data.frame containing summary data.
    # I've used standard deviation as place-holder for confidence intervals;
    # I'll let you calculate those on your own.
    summary_dat = data.melt %>%
                  group_by(resource, fertilizer, time) %>%
                  summarise(mean_predicted=mean(predict),
                            upper_ci=mean(predict) + sd(predict),
                            lower_ci=mean(predict) - sd(predict))
    
    p = ggplot() + 
        theme_bw() +
        geom_errorbar(data=summary_dat, aes(x=time, ymax=upper_ci, ymin=lower_ci),
                      width=0.3, size=0.7, colour="tomato") + 
        geom_point(data=data.melt, aes(x=time, y=value),
                   size=1.6, colour="grey20", alpha=0.5) +
        geom_point(data=summary_dat, aes(x=time, y=mean_predicted),
                   size=3, shape=21, fill="tomato", colour="grey20") +
        facet_grid(resource ~ fertilizer)
    
    ggsave("plot.png", plot=p, height=4, width=6.5, units="in", dpi=150)
    

    enter image description here