The task is to create several different spreadsheet files for a data analysis exam. I want to have 50 versions of a spreadsheet that are based on the same data, but every file should have some rows with randomly edited values.
For example, the data sheet is based on marks of a list of students in various subjects. I want to create 50 spreadsheets where randomly chosen cells are edited for a specific column. Hence, the mean/median of the marks in that one subject will be different for every data file.
The workflow that I believe needs to be followed is :
I request help from the community to create a R/Python script to do this.
I used the following R code
list_of_files <- dir("folder1")
# folder1 has 50 spreadsheets and is in the working directory
for(f in list_of_files){
df1<-openxlsx::read.xlsx("f", sheet = "data", colNames=TRUE)
df1[sample(.N,1), math_marks := ExcelFunctionsR::RANDBETWEEN(60, 95, number = 1)]
openxlsx::write.xlsx(df1, file = "f", sheetName="data",colNames = TRUE)
}
The list_of_files objects shows "file1.xlsx" "file2.xlsx" and so on. I see the value of f as "file1.xlsx" in Environment
The Error is in read.xlsx.default("f", sheet = "data", colNames = TRUE) : File does not exist.
or when I edit the "f" to f:
Error in read.xlsx.default(f, sheet = "data", colNames = TRUE) : File does not exist.
Here is a way to run your loop that should work (it worked on my sample files - replace "your_file_path" with your file path). Using openxlsx::write.xlsx()
writes a whole new workbook, so if you just want to modify a single sheet, you will want to open the workbook (wb
) using openxlsx::loadWorkbook()
then use openxlsx::writeData()
to overwrite the specific sheet:
list_of_files <- list.files(path = "your_file_path", pattern = '*.xlsx', full.names = TRUE)
for(f in list_of_files){
wb <- openxlsx::loadWorkbook(f)
df1 <- openxlsx::read.xlsx(f, sheet = "data", colNames=TRUE)
idx <- sample(1:nrow(df1), 10)
df1[idx, "math_marks"] <- round(runif(length(idx), 65, 95))
openxlsx::writeData(wb, sheet = "data", df1, colNames = TRUE)
}