Hi I have a number of PDF files saved in one folder. Every PDF files has number of currency value starting with $ , i want to extract the first currency value in each file , i am able to do it for single file but not when looping through number of files where i will get the the output from each file Like $xxx,xxx,xxx $xxx,xxx,xxx $xxx,xxx,xxx Code when i am using for single file
''''
text_data <- pdf_text('Sample2.pdf')
text_collapsed_data <- paste0(text_data, collapse = '\n')
k=str_extract_all(text_collapsed_data, "\\$\\d+(?:,\\d+)(?:,\\d+)")[[1]]
k[1]
'''' Code when i am using to loop for multiple flies
''''
files <- list.files(pattern = "pdf$")
for (i in 1:length(files)){
print(i)
pdf_text(paste(str_extract_all("~filepath/desktop",files[i], "\\$\\d+(?:,\\d+)(?:,\\d+)")[[1]]))
}
''''
getting error subscript out of bounds Let me where can i go wrong
You can do:
myextr <- function(pdffile) {
text_data <- pdf_text(pdffile)
text_collapsed_data <- paste0(text_data, collapse = '\n')
k=str_extract_all(text_collapsed_data, "\\$\\d+(?:,\\d+)(?:,\\d+)")[[1]]
k[1]
}
files <- list.files(pattern = "pdf$")
sapply(files, myextr)