For very basic insights in a couple of hundred pdf's, I want to calculate the readability score (Flesch Kincaid) of all these pdf's and present them in a spreadsheet. My skills in R are inadequate and I can't find the solution myself. I'm looking for a very basic solution. This is what I have so far:
directory <- "my_folder"
my_corpus <- VCorpus(DirSource(directory, pattern = ".pdf),
readerControl = list(reader = readPDF, language = "dutch"))
however, when using quanteda, I get the error message: 'row names supplied are of the wrong lenght' when using the following
textstat_readability(corpus(my_corpus), measure = "Flesch.Kincaid")
Is there a way to remedy this, or does an alternative exist?
Yes - avoid the tm workflow.
directory <- "my_folder"
my_corpus <- readtext::readtext(paste0(directory, “/*.pdf”))
textstat_readability(corpus(my_corpus))
But keep in mind that the syllable count function required by many readability measures may not operate correctly in Dutch.