Search code examples
rxlsx

R (xlsx) - Write Multiple Excel Files From DF Based on Column Values


I have a dataframe with many rows (>300,000) which I would like to write to multiple Excel files using xlsx package.

Sample data -

myData <- data.frame(Letter=c("A","B","C","D","E","F","G","H","I","J"),
                 Fruit = c("Apples", "Pears", "Oranges", 
                           "Carrots", "Mangoes", "Potatoes",
                           "Bananas", "Plums","Grapes", 
                           "Broccoli"),
                 Country = c("Scotland", "Scotland", "Scotland",
                             "England", "England", "England",
                             "Wales", "Wales", "Wales",
                             "Ireland"))

So, from the sample data I would like to split my Excel workbooks by the 'Country' column. There would be 4 Excel files with the country name as the filename and I would like them to look like below -

Letter  Fruit    Country
------------------------
A       Apples   Scotland
B       Pears    Scotland
C       Oranges  Scotland

Taking the 'Country' column out of the dataset would be a bonus.

I've figured out how to split these into multiple dataframes using a For loop, and then I could write a 'write.xlsx' line for each country. But my list of countries is dynamic. See below example -

for (i in unique(myData$Country)) {
  test <- paste0("excel", i)
  assign(test, myData[myData$Country == i,])
}

I've struggled to fit the write.xlsx function into that loop, but perhaps there's a better way.

Could anybody help please?


Solution

  • I would just split them into a list, then use an lapply for the write.xlsx. I'd suggest lapply over a for-loop.

    myDatalist <- split(myData, myData$Country)
    lapply(1:length(myDatalist), function(x) write.xlsx(myDatalist[[x]],
    file = paste0(names(myDatalist[x]), ".xlsx"), row.names = FALSE))
    

    All this does is apply a function to a list.