Search code examples
rexcelxlconnectvba

R/vba - Open and save several workbooks (+500) and discarding formulas ie. keeping only dataframe


I would like to open several workbook files (xlsx), with multiple spreadsheets, discard the formulas in each cell and only keeping the data inside. I would do it mannualy but with at least 500 workbooks i would like to automise the task.

I found myself multiple partial solutions. Maybe the most simple one would be to do it via vba, but i have no idea how to write those instructions. The other solutions i could manage would be through R.

I tested several packages, the most usefull being xlsx, unfortunatelly i can only save 1 spreadsheed at a time, and i have more than 15 per workbook, and i didnt find how to combine each spreadsheet in one file (one workbook).

The solution i came so far is to open files inside a loop, then save them again with XLConnect. I searched for a write-without-formula, knowing that open-without-formula exists, but i didn't manage to find the option ... Is there a simple solution with those packages or should i go more into VBA ?

tentative in r :

library("XLConnect")
wbdir <- setwd("C:/Users/blabla/Desktop/wbfiles")
wblist <- list.files(wbdir)

i <- 0L
for (i in 1:length(wblist)) {

        wb_formulas <- loadWorkbook(wblist[[i]])
        # something to overwriteformulas : opening witout formulas like xlsx/openxlsx package 
        # writing without ...
        wb_df <- saveWorkbook(wb_formulas, paste(c("wb_", i,".xlsx"),collapse = ""))
}

Solution

  • Consider R's RDComclient which can COM interface to the Excel object library where you can use R to loop through needed files and corresponding worksheets and use Excel's own methods (not third-party package methods), Range.Copy and Range.PasteSpecial, to copy cell values, leaving out formulas.

    Below procedure is wrapped in a tryCatch to cleanly exit the processing even with simple errors. Do note: some COMExceptions are not caught and you will need to exit the EXCEL.EXE process from Task Manager.

    library(RDCOMClient)
    
    wbdir <- setwd("C:/Users/blabla/Desktop/wbfiles")
    wblist <- list.files(wbdir, full.names = TRUE)
    
    xlPasteValues <- -4163
    
    for (wb in wblist) {    
      tryCatch({
        # Create COM Connection to Excel
        xlApp <- COMCreate("Excel.Application")
        xlApp[['Visible']] <- FALSE
        xlApp[['DisplayAlerts']] <- FALSE
    
        # Open workbook
        xlWB <- xlApp[["Workbooks"]]$Open(wb)
    
        # Iterate through each worksheet
        for (s in seq(xlWB$Worksheets()$Count())) {
          xlSheet <- xlWB$Worksheets(s)
    
          # Copy and paste values
          xlSheet$Cells()$Copy() 
          xlSheet$Cells()$PasteSpecial(xlPasteValues)
    
          xlApp[['CutCopyMode']] <- FALSE
    
          xlSheet$Activate()
          xlSheet$Range("A1")$Select()        
        }
    
        }, error=function(e) {
            print(e)  
        },
    
        finally={
          xlApp[['DisplayAlerts']] <- TRUE
    
          # Save Changes
          xlWB$Close(TRUE)        # THIS WILL OVERWRITE (BACKUP IN TEST MODE)
          xlApp$Quit()
    
          # Release resources
          xlSheet <- NULL
          xlWB <- NULL
          xlApp <- NULL
    
          rm(list=ls()) 
          gc()
        }
      )    
    }
    

    Since COM is a Windows technology, above solution only works in R running in Windows environments.