Search code examples
rhttp-redirectrcurlscraper

R scraping when URL is Redirected (302)


R related problem and I am quite new to R

I am running a scraper on the movie database but at least one URL is redirected to another page.

Do you have any idea how I could follow the URL and scrape the redirected site instead.

I've been getting the XML by using this method

require(XML) 
require(RCurl) 
fixedURL <- getURL("https://www.themoviedb.org/movie/260346-taken-3/cast")
parsed.html <- htmlParse(fixedURL)

Could also use the scrapeR package if that would help.

but the URL is redirected (302) to "https://www.themoviedb.org/movie/260346-tak3n/cast" Any Ideas how I can make it follow the redirection? (It is a part of a loop and there are very few redirections.)


Solution

  • The rvest package seems to land on the correct page...

    library("rvest")
    url <- "https://www.themoviedb.org/movie/260346-taken-3/cast"
    # get movie title
    url %>% 
      html() %>% 
      html_nodes("#mainCol :nth-child(1) :nth-child(1) :nth-child(1) :nth-child(1)") %>%
      html_text()
    
    [1] "Taken 3"