Search code examples
rknitrxtable

Loops, knitr and xtable in rmarkdown to create unique tables in multiple reports


I'm revising my question in full. I realized it was long, and my point was getting lost.

Here's what I need to do:

Create automatic reports for schools that contain tables that compare their data to the district that school is in as well as the whole state. The State is the entire data set.

Here's what I understand:

How to create the automated loop that transverses through the data and creates a unique PDF report for each school. This post was very helpful in setting up a framework to generate reports.

Here's what I need help with:

I need a table that contains the following columns: School, District, State I also need the first column of the table to contain a row for: Sample Size, Mean, Standard Deviation.

I'm trying to create this in the context of a for loop because I need a unique table in each unique pdf that is created. If there is a better approach I'd love to hear about it.

Anyways, here is a reproducible example that I tested. I haven't gotten far in creating the table.

Any help would be very appreciated.

driver.r:

# Create dataset
set.seed(500)
School <- rep(seq(1:20), 2)
District <- rep(c(rep("East", 10), rep("West", 10)), 2)
Score <- rnorm(40, 100, 15)
Student.ID <- sample(1:1000,8,replace=T)
school.data <- data.frame(School, District, Score, Student.ID)

#prepare for multicore processing 
require(parallel)
# generate the rmd files, one for each school in df
library(knitr)
mclapply(unique(school.data$School), function(x) 
  knit("F:/sample-auto/auto.Rmd", 
       output=paste('report_', x, '.Rmd', sep="")))

# generate PDFs from the rmd files, one for each school in df
mclapply(unique(school.data$School), function(x)
  rmarkdown::render(paste0("F:/sample-auto/", paste0('report_', x, '.Rmd'))))

auto.Rmd:

---
title: "Automated Report Generation for Data"
author: "ME"
date: "February 5, 2015"
output: 
  pdf_document:
  toc: true
  number_sections: true
---

```{r, echo=FALSE}
library(xtable)
library(plyr)
df <- data.frame(school.data)
subgroup <- df[school.data$School == x,]
```

# Start of attempt 

```{r results='asis', echo=FALSE}
 for(school in unique(subgroup$School))
{
subgroup2 <- subgroup[subgroup$School == school,]
savename <- paste(x, school)
df2<- mean(subgroup2$Score, na.rm=TRUE)
df2 <- data.frame(df2)
print(xtable(df2))
}
```

I also tried replacing the loop with:

```{r results='asis', echo=FALSE}
df2 <- ddply(school.data, .(School), summarise, n = length(School), mean =      
mean(Score), sd = sd(Score))
print(xtable(df2))
```

This gives me what I don't want in that all of the schools get the data for every school vs. just their school.


Solution

  • If you are using a loop to subset the data before passing it to the .rmd file, you don't really need plyr or ddply to do the split/apply/combine for you. Since you have a lot of observations, it may be noticeable overhead.

    Also, if you are creating the subgroups before running the .rmd, you don't need the loop inside the file, either. You just need to make a data frame with the stats you want and use xtable

    ---
    title: "Automated Report Generation for Data"
    author: "ME"
    date: "February 5, 2015"
    output: 
      pdf_document:
        toc: true
        number_sections: true
    ---
    
    ```{r, echo=FALSE}
    library(xtable)
    library(plyr)
    # Create dataset
    set.seed(500)
    School <- rep(seq(1:20), 2)
    District <- rep(c(rep("East", 10), rep("West", 10)), 2)
    Score <- rnorm(40, 100, 15)
    Student.ID <- sample(1:1000,8,replace=T)
    school.data <- data.frame(School, District, Score, Student.ID)
    
    
    x <- unique(school.data$School)[1]
    subgroup <- school.data[school.data$School == x, ]
    ```
    
    # Start of attempt 
    
    ```{r results='asis', echo=FALSE}
    options(xtable.comment = FALSE)
    ## for one school, it is redundant to split based on school but works
    ## likewise, it is redundant to have a loop here to split based on school
    ## if you have already used a loop to create the subgroup data 
    res <- ddply(subgroup, .(School), summarise,
                 n = length(School),
                 mean = mean(Score),
                 SD = sd(Score),
                 VAR = var(Score))
    xtable(res)
    
    ## if you pass in the entire data frame you will get all schools
    ## then you can subset the one you want
    res <- ddply(school.data, .(School), summarise,
                 n = length(School),
                 mean = mean(Score),
                 SD = sd(Score),
                 VAR = var(Score))
    
    xtable(res[res$School %in% x, ])
    ```
    

    enter image description here