i am trying to create a single data frame in R, that contains the text from multiple xml files. I have tried to create a function that reads the xmls and uses the xml_text from the xml2 package. This is my code:
read_texts <- function(folder) {
dir_ls(folder, glob = "*.xml") %>%
map_dfr(
read_xml(.) %>%
xml_text(., trim = TRUE) %>%
tibble()
)
}
read_texts_n <- Vectorize(read_texts)
read_texts_n("forfatterskab")
When I'm doing this I still get the error:
Error: `x` must be a string of length 1
How do I get the code to load my files. The aim is to make a single data frame, that contains all the text. I am not that experienced working with XML.
I don't think you need to Vectorize
your function since you are using map_dfr
.
Try using the below function.
library(xml2)
read_texts <- function(folder) {
list.files(folder, pattern = '\\.xml$', full.names = TRUE) %>%
map_dfr(~.x %>% read_xml() %>% xml_text(trim = TRUE) %>%tibble())
}
result <- read_texts_n("forfatterskab")
The only doubt I have is the way you pass the folder name. Usually, I would expect you pass a complete folder path to the function. Something like read_texts_n('Users/username/folder_name')
.