Search code examples
rfor-loopdplyredit

How to edit randomly chosen cells in a specific column of a spreadsheet and repeat this task for several spreadsheets?


The task is to create several different spreadsheet files for a data analysis exam. I want to have 50 versions of a spreadsheet that are based on the same data, but every file should have some rows with randomly edited values.

For example, the data sheet is based on marks of a list of students in various subjects. I want to create 50 spreadsheets where randomly chosen cells are edited for a specific column. Hence, the mean/median of the marks in that one subject will be different for every data file.

The workflow that I believe needs to be followed is :

  1. Read the table into R/Python as a data object from a worksheet of the first spreadsheet file
  2. Choose cells from a specific column at random and edit them using randbetween() function
  3. Overwrite the worksheet in the existing spreadsheet file
  4. Repeat this with a for loop over all the spreadsheet files in a folder

I request help from the community to create a R/Python script to do this.

I used the following R code

list_of_files <- dir("folder1")

# folder1 has 50 spreadsheets and is in the working directory

for(f in list_of_files){
          df1<-openxlsx::read.xlsx("f", sheet = "data", colNames=TRUE)
          df1[sample(.N,1), math_marks := ExcelFunctionsR::RANDBETWEEN(60, 95, number = 1)]
          openxlsx::write.xlsx(df1, file = "f", sheetName="data",colNames = TRUE)
 }

The list_of_files objects shows "file1.xlsx" "file2.xlsx" and so on. I see the value of f as "file1.xlsx" in Environment

The Error is in read.xlsx.default("f", sheet = "data", colNames = TRUE) : File does not exist.

or when I edit the "f" to f:

Error in read.xlsx.default(f, sheet = "data", colNames = TRUE) : File does not exist.


Solution

  • Here is a way to run your loop that should work (it worked on my sample files - replace "your_file_path" with your file path). Using openxlsx::write.xlsx() writes a whole new workbook, so if you just want to modify a single sheet, you will want to open the workbook (wb) using openxlsx::loadWorkbook() then use openxlsx::writeData() to overwrite the specific sheet:

    list_of_files <- list.files(path = "your_file_path", pattern = '*.xlsx', full.names = TRUE)
    
    for(f in list_of_files){
      wb <- openxlsx::loadWorkbook(f)
      df1 <- openxlsx::read.xlsx(f, sheet = "data", colNames=TRUE)
      idx <- sample(1:nrow(df1), 10)
      df1[idx, "math_marks"] <- round(runif(length(idx), 65, 95))
      openxlsx::writeData(wb, sheet = "data", df1, colNames = TRUE)
    }