Search code examples
rdataframeopenxlsx

R: Split Data Frame into Workbook/Worksheets WITH column names


Problem: In the following example I am not able to get the column names from DL_Import into the many worksheets of the workbook wb and further into the many worksheets of the output Excel file output.xlsx...

Context: I have a data frame called DL_Import with close to 8,000 rows of data and 19 columns. I need to break this up into multiple worksheets for the saved xlsx file in order to process them separately downstream. I have completed this by doing the following:

DL_Split <- split(DL_Import,rep(1:500,each=1000)) 
chunk <- 1000 
numRows <- nrow(DL_Import) 
DL_row  <- rep(1:ceiling(numRows/chunk),each=chunk)[1:numRows] 
DL_Split <- split(DL_Import,DL_row) 
numSheets <- length(DL_Split) 

wb <- createWorkbook() 
for (i in 1:numSheets)
{
  addWorksheet(wb, i) 
  writeData(wb, i, DL_Split[i], startRow = 1, startCol = 1, colNames = TRUE, rowNames = TRUE)
}

saveWorkbook(wb, file = "output.xlsx", overwrite = TRUE)

The workbook saves correctly with, in this case, 8 worksheets with the names "1", "2", ... "8" but is missing the column titles in row 1 in each worksheets.

Question: How can I ensure the column names from DL_Import gets propagated into each of the wb worksheets to ensure it's written to output.xlsx file?


Solution

  • the error is that you were using [ instead of [[, the difference between these two is fairly simple yet critical:

    • [ returns an object of the same class and can select one or more elements.
    • [[ returns only one element without coercing it to the type of the parent structure.

    if you try this code, you'll find out that they're different:

    class(DL_Split[1])
    #> [1] "list"
    class(DL_Split[[1]])
    #> [1] "data.frame"
    

    this is why openxlsx::writeData failed to get the column names and row names.


    the corrected code should look like this

    for (i in 1:numSheets)
    {
      addWorksheet(wb, i) 
      writeData(wb, i, DL_Split[[i]], startRow = 1, startCol = 1, colNames = TRUE, rowNames = TRUE)
    }