I have a folder that contains *.docx files. I want to convert the script below into some sort of a loop function to read all docx files but I really dont know how to write R function and someone please guide me?
library(docxtractr)
real_world <- read_docx("C:/folder/doc1.docx")
docx_tbl_count(real_world)
tbls <- docx_extract_all_tbls(real_world)
a <- as.data.frame(tbls)
So ideally it appends new table everytime a new document is extracted.
Thanks Peddie
Edit: I assumed for this answer that the term "function" was not used in the sense of an R function by OP. I think OP means just an algorithm to solve the problem.
#### load packages ####
library(docxtractr)
library(plyr)
#### load data ####
# define path of dir
pathto <- "stackoverflow/41251392/example/"
# get path of every .docx-file in dir
filelist <- list.files(path = pathto, pattern = "*.docx", full.names = TRUE)
# read every file with docxtractr::read_docx()
tablelist <- lapply(filelist, read_docx)
# extract every table from every file with docxtractr::docx_extract_all_tbls()
tables <- lapply(tablelist, docx_extract_all_tbls)
#### append data to create one data.frame ####
# combine extracted tables with plyr::ldply()
ldply(lapply(tables, function(x) {ldply(x, data.frame)}), data.frame)
The last line is a bit difficult to understand. Take a look at ?plyr::ldply
.