Search code examples
javarimagebrowser-cacherselenium

Trying to download a cached picture in RSelenium


I am using RSelenium to download a series of newspaper articles from an online repository. So far, the way I am doing so is using the remDr$screenshot() function but, due to resolution, zooming and framing reasons, I wonder if it is possible to just download the picture as it is presented. The sample code to access a page is the following:

library(RSelenium)
rD1 <- rsDriver(browser = "firefox",port=4567L)
remDr <- rD1[["client"]] 

url1<-"http://memoria.bn.br/DocReader/DocReader.aspx?"
url2<-"bib=090972_07&pesq=cangaceiro&pasta=ano%20192"
remDr$navigate(paste0(url1,url2))

By looking at the source code of the page, I note that the image is hosted in a cache url cache/2286106490137/I0000051-20Alt=000869Lar=000615LargOri=005060AltOri=007149.JPG (with id DocumentoImg). Is there a way to simply download it from this address, without relying on screenshots?


Solution

  • Yes, you can download the image directly in R like this:

    # I have split the url just to make it legible on screen here
    url_pt1  <- "http://memoria.bn.br/DocReader/cache/2627304510157"
    url_pt2  <- "/I0000051-20Alt=001984Lar=001404LargOri=005060AltOri=007149.JPG"
    big_url  <- paste0(url_pt1, url_pt2)
    
    # Choose local file location to download file
    file_to  <- "download.jpg" 
    
    download.file(big_url, file_to)
    #> trying URL 'http://memoria.bn.br/DocReader/cache/2627304510157
    #> /I0000051-20Alt=001984Lar=001404LargOri=005060AltOri=007149.JPG'
    #> Content type 'text/html; charset=utf-8' length 8457 bytes
    #> downloaded 8457 bytes