I am trying to do some text mining of a pdf by searching for certain keywords.
This is my code:
library(pdftools)
library(tidyverse)
library(pdfsearch)
UC_text <- pdf_text("https://wilmar-iframe.todayir.com/attachment/20190411162436345449392_en.pdf")
result <- keyword_search(UC_text,
keyword = c('SUBSTANTIAL SHAREHOLDERS'),
path = TRUE, surround_lines = 1)
However, I got the error message of a filename too long. How can I get over this issue?
Given the explanation in the cran manual of pdfsearch, you can directly pass the PDF link to the keyword_search()
. In this way, I do not see the error message you provided. I rather got the following result.
result <- keyword_search("https://wilmar-iframe.todayir.com/attachment/20190411162436345449392_en.pdf",
keyword = c('SUBSTANTIAL SHAREHOLDERS'),
path = TRUE, surround_lines = 1)
keyword page_num line_num line_text token_text
<chr> <int> <int> <list> <list>
1 SUBSTANTIAL SHAREHOLDERS 49 2010 <chr [3]> <list [3]>