Search code examples
rggplot2rstudioknitrxtable

Formatting output with Knitr, ggplot2 and xtable


I am trying to achieve the following task with Knitr, ggplot2 and xtables:

  • Generate several annotated plots of beta-distributions with ggplot2
  • Write the output in a layout such that I have a plot, and a corresponding summary Stats table following it, for every plot.
  • Write the code such that both PDF and HTML reports can be a generated in a presentable way

Here is my attempt at this task (Rnw file):

\documentclass{article}

\begin{document}

Test for ggplot2 with Knitr

<<Initialize, echo=FALSE>>=
library(ggplot2)
library(ggthemes)
library(data.table)
library(grid)
library(xtable)
library (plyr)

pltlist <- list()
statlist <- list()

@

The libraries are loaded. Now run the main loop


<<plotloop, echo=FALSE>>=
    for (k in seq(1,7)){
      x <- data.table(rbeta(100000,1.6,14+k))
      xmean <- mean(x$V1, na.rm=T)
      xqtl <- quantile(x$V1, probs = c(0.995), names=F)
      xdiff <- xqtl - xmean
      dens <- density(x$V1)
      xscale <- (max(dens$x, na.rm=T) - min(dens$x, na.rm=T))/100
      yscale <- (max(dens$y, na.rm=T))/100
      y_max <- max(dens$y, na.rm=T)
      y_intercept <- y_max-(10*yscale)
      data <- data.frame(x)

      y <- ggplot(data, aes(x=V1)) + geom_density(colour="darkgreen", size=2, fill="green",alpha=.3) +
        geom_vline(xintercept = xmean, colour="blue", linetype = "longdash") +
        geom_vline(xintercept = xqtl, colour="red", linetype = "longdash") +
        geom_segment(aes(x=xmean, xend=xqtl, y=y_intercept, yend=y_intercept), colour="red", linetype = "solid", arrow = arrow(length = unit(0.2, "cm"), ends = "both", type = "closed")) +
        annotate("text", x = xmean+xscale, y = y_max, label = paste("Val1:",round(xmean,4)), hjust=0) +
        annotate("text", x = xqtl+xscale, y = y_max, label = paste("Val2:",round(xqtl,4))) +
        annotate("text", x = xmean+10*xscale, y = y_max-15*yscale, label = paste("Val3:",round(xdiff,4))) +
        xlim(min(dens$x, na.rm=T), xqtl + 9*xscale) +
        xlab("Values") +
        ggtitle("Beta Distribution") +
        theme_bw() +
        theme(plot.title = element_text(hjust = 0, vjust=2))

      pltlist[[k]] <- y
      statlist[[k]] <- list(mean=xmean, quantile=xqtl) 

}

stats <- ldply(statlist, data.frame)
@

Plots are ready. Now Plot them

<<PrintPlots, warning=FALSE, results='asis', echo=FALSE, cache=TRUE,  fig.height=3.5>>=
for (k in seq(1,7)){
  print(pltlist[[k]])
  print(xtable(stats[k,], caption="Summary Statistics", digits=6))
}

@

Plotting Finished.


\end{document}

I am faced with several issues after running this code.

  1. When I run this code just as R code, Once I try to print the plots in the list, the horizontal line from the geom_segment part starts to move all over the place. However if I plot the figures individually, without putting them in a list, the figures are fine, as I would expect them to be.
  2. Only the last plot is as I would expect the output to be, in all the other plots, the geom_segment line moves around randomly.
  3. I am also unable to put a separate caption for the Plots as I can for the Tables.

Points to note :

  • I am storing the beta-random numbers in data.table since in our actual code, we are using data.table. However for the purposes of testing ggplot2 in this way, I convert the data.table into a data.frame, as ggplot2 requires.
  • I also need to generate the random numbers within the loop and generate the plots per iteration (so something like first generating the random numbers and then using melt would not work here), since generating the random numbers is emulating a complex database call per iteration of the loop.

I am using RStudio Version 0.98.1091 and R version 3.1.2 (2014-10-31) on Windows 8.1

This is the expected Plot: Expected Plot

This is the plot I am getting when plotting from the list: Plot from the list

My output in PDF form : PDF Output

Please advice if there are any ideas for solutions.

Thank you,

SG


Solution

  • I don't know why the horizontal line in geom_segment is "moving around" from plot to plot, rather than spanning xmean to xqtl. However, I was able to get the horizontal line in the correct location by getting the value from the stats data frame, rather than from direct calculation of the mean and quantile. You just have to create the stats data frame before the loop, rather than after, so that you can use it in the loop.

      stats <- ldply(statlist, data.frame)
    
      for (k in seq(1,7)){
        ...
    
        y <- ggplot(data, aes(x=V1)) + 
            ...
            geom_segment(aes(x=stats[k,1], xend=stats[k,2], y=y_intercept, yend=y_intercept), 
                     colour="red", linetype = "solid", 
                     arrow = arrow(length = unit(0.2, "cm"), ends = "both", type = "closed")) +
            ...
    
      pltlist[[k]] <- y
      statlist[[k]] <- list(mean=xmean, quantile=xqtl) 
      }
    

    Hopefully, someone else will be able to explain the anomalous behavior, but at least this seems to fix the problem.

    For the figure caption, you can add a fig.cap argument to the chunk where you plot the figures, although this results in the same caption for each figure and causes the figures and tables to be plotted in separate groups, rather than interleaved:

    <<PrintPlots, warning=FALSE, results='asis', echo=FALSE, cache=TRUE, fig.cap="Caption", fig.height=3.5>>=
    for (k in seq(1,7)){
      print(pltlist[[k]])
      print(xtable(stats[k,], caption="Summary Statistics", digits=6))
    }