Search code examples
rweb-scrapingrvesthttr

rvest not suitable for this scraping functionality


I am trying to use R to webscrape this webpage for its title contents, but rvest isnt turning out to be a good tool for this job.

My code:

url <-"https://letterboxd.com/crew/list/most-fans-on-letterboxd-with-pronoun-she/"

title <- read_html(url) %>% 
  html_nodes("span .frame-title") %>% # selector 
  html_text()  

Which should give me the title associated with the given node (using example: the film Her (2013))...

<span class="frame-title" data-reactid=".c.3.1">Her (2013)</span>

...but instead I get blank ("") output each time and for each slot.

I was considering the RCurl package something but I dont know if it really would help with my situation in extracting nodes. I'd like some assistance in this department for grabbing the titles under "frame-title" for this webpage. Any assistance would be greatly appreciated.


Solution

  • The page source for that website does not looks like what you have posted. The below should fix it:

    read_html(url) %>% 
        html_nodes("img") %>% 
        html_attr("alt")