Search code examples
rloopsfor-loopsubdirectory

loop merging all pdf files from subfolders


I am trying create a loop to merge all pdfs from subfolders and create a unique pdf for each subfolder with the name of the subfolder's name.

As an Example: I have the "23201018", inside of which I have subfolders and other documents, which are irrelevant

  • "T34709"
  • "T8257"
  • etc

inside each subfolder there are several pdf's I would like to merge and give, the resultig merged pdf,the name of the subfolder.

I managed to run this for one subfolder at a time

wd<-setwd(WDPATH)
list.files(paste0(wd,"/23201018"))# Folder with subfolders inside
#  "23201018.XLSX" "T34709" "T26045" "T85625"
list.files(paste0(wd,"/23201018/T34709")) # I tried for one subfolder at a time


# CREATES A JOIN PDF WITH ALL THE DOCUMENTS IN THE FOLDER
staple_pdf(
  input_directory = paste0(wd,"/23201018/T34709"),
  input_files = NULL,
  output_filepath = paste0(wd,"/23201018/T34709/",basename(paste0(wd,"/23201018/T34709")),".pdf"),
  overwrite = FALSE
)

how can I make this inside a loop?


Solution

  • Here is an example that assumes that your current working directory contains a folder called “23201018”. Within that folder are subfolders “one” and “two” which contain pdf files. There is another folder “no_pdf”, which has no pdf files.

    1. Gather all folders that have pdf files in them.
    2. Create a vector with the file paths for the pdf files.
    3. Use mapply() to feed it all into the staplr::staple_pdf() function.
    library(staplr)
    
    list.dirs("23201018/")
    #> [1] "23201018/"        "23201018//no_pdf" "23201018//one"    "23201018//two"
    
    folder_names <- 
      list.files("23201018/", recursive = TRUE, pattern = "\\.pdf",
                 full.names = TRUE) |> 
      dirname() |> 
      unique()
    
    pdf_names <- 
      paste0(folder_names, ".pdf") 
    
    pdf_paths <- 
      file.path("23201018", pdf_names)
    
    mapply(\(x, y) staple_pdf(input_directory = x,
                              output_filepath = y), 
           folder_names,
           pdf_names)
    #> 23201018//one 23201018//two 
    #>          TRUE          TRUE