Search code examples
rrcurlhttr

How can I follow any redirections of a url in R?


Suppose I have the following url:

http://linkinghub.elsevier.com/retrieve/pii/S1755534516300379

When entering this into my standard desktop browser, I get redirected to:

http://www.sciencedirect.com/science/article/pii/S1755534516300379?via%3Dihub

However, I am not able to implement this in R. I tried the packages httr and RCurl. In the documentation of httr, it says the function GET used as follows:

library(httr)
GET("http://linkinghub.elsevier.com/retrieve/pii/S1755534516300379")

is supposed to lead to the actual url used (after any redirects). But when calling the url:

GET("http://linkinghub.elsevier.com/retrieve/pii/S1755534516300379")$url

I don't get the final redirection. I would very much appreciate your help!


Solution

  • The redirection at this site works with javascript, not http. So the redirection will not work unless you interpret the content of the downloaded document.

    If you want to parse many documents from the same site you could parse the redirection url directly from the document.

    If you want to parse many different sites with different redirection mechanisms, you will need some library that actually loads the site and runs the javascript, for example RSelenium.