Search code examples
rtexttext-mining

How to receive text from pdf in R properly?


I want to make my own word embedding in R. I tried to open and receive text from pdf but it gives me this error: Error in normalizePath(path.expand(path), winslash, mustWork) : path[1]="goethe_faust.pdf": No file found

Weird is that this file exists and I can open it with any pdf reader. It's not password locked or something like that. My code:

library(pdftools)
file_vector <- list.files(path = "pdf_collections")
pdf_text <- pdf_text(file_vector[1]) 

Solution

  • By default list.files just includes the file names. To open these files, you will need to include your path (pdf_collections). You can fix this by specifying that you want the full path to the files.

    file_vector <- list.files(path = "pdf_collections", full.names=TRUE)