I am working on a sentiment analysis project. My aim is to scrape all the reviews from the rotten tomatoes website of a particular movie. I have tried to scrape it but it is giving me illegal characters, not the reviews I want. Any suggestion will be highly appreciated.
I am using this function:
dune_movie <- read_html("https://www.rottentomatoes.com/m/dune_2021/reviews")
Output I am getting:
<html lang="en" dir="ltr" xmlns:fb="http://www.facebook.com/2008/fbml" xmlns:og="http://opengraphprotocol.org/schema/">
[1] <head prefix="og: http://ogp.me/ns# flixstertomatoes: http://ogp.me/ns/ap ...
[2] <body class="body no-touch">\n\n \n\n <div id="emptyPlaceho ...
gives you the html of the entire page in a navigable document tree. If you want to scrape the reviews, you have to find the elements on the page that contain the reviews:
read_html("https://www.rottentomatoes.com/m/dune_2021/reviews") %>%
html_elements(xpath = "//div[@class='the_review']") %>%
#> [1] "\r While Dune 2021 is a very well presented and styled gourmet sci-fi dish, for a platter stuffed with \"spice\", it unfortunately lacks a ton of Dune flavor.\r"
#> [2] "\r Feels more like an obscure, scattered conversation overheard on a long train ride, peaking early with Rampling’s natural mystique, and then hitting a downward spiral – all dense plot and mild tedium, a bounty of sensual imagery wasted on zero substance.\r"
#> [3] "\r Paul's family is evidently descended from a long line of matadors.\r"
#> [4] "\r The paramount attractions, the visuals, connect immediately. The landscapes, costuming, sets, makeup, creatures and wild special effects are stunning.\r"
#> [5] "\r Duna works sometimes like a strange abstract opera in which we perceive more the intensities than the representation of the events, and sometimes not only do we not know what part of the story we are in, but what is concretely happening.\r"
#> [6] "\r To promise a whole series of films might be dressed up as a gift for fans, but theres a lingering cynicism about this project a feeling that its essentially a way of maximising returns at the box office.\r"
#> [7] "\r This Dune toys with the idea of genocide, but it’s mostly a movie for people who like to memorize things. All these stupid names, one after another. \r"
#> [8] "\r Dune might not be for everyone; but if you strap in, immerse yourself in the world and go along for the ride, Denis Villenueve delivers a blockbuster sci-fi epic that's regularly jaw-dropping.\r"
#> [9] "\r A dishwater war narrative masquerading as sci-fi. Pshaw.\r"
#> [10] "\r A blockbuster celebrating the awe-inspiring power of the big screen that everyone can get behind.\r"
#> [11] "\r If nothing else, Dune wins its place as a masterpiece of adaptation, truncating roughly half the novel into its runtime to expand the books monomaniacal focus on Paul into a more ensemble narrative.\r"
#> [12] "\r Mature yet juvenile, otherworldly yet pleasingly familiar, “Dune” demands to be experienced on the biggest screen you can find (sandworms!).\r"
#> [13] "\r Dune: Part One will leave fans not only wanting, but hoping for more. The spice, in other words, must flow.\r"
#> [14] "\r Dune is another triumph for Villeneuve and I can’t wait to see him finish his epic.\r"
#> [15] "\r Denis Villeneuve’s windswept epic is engrossing enough to maintain an audience with an intermission and a running time twice its length.\r"
#> [16] "\r From the costumes to the enormous machinery and craft of mining spice the film presents a beautifully realised and consistent universe.\r"
#> [17] "\r Dune falls under a high-brow take on science fiction. It is an entertaining cinematic feat to say the least. The franchise has a bright future.\r"
#> [18] "\r \"Perhaps the best, or at least the most revealing, thing that can be said about Denis Villeneuves grandly mounted adaptation of Frank Herberts 'Dune' is that he makes it look easy.\"\r"
#> [19] "\r Many people will encourage you to see new releases on as big a screen as possible. With a masterful spectacle like Dune, its practically a commandment.\r"
#> [20] "\r Its stateliness is both an asset and a detriment.\r"
Created on 2022-05-28 by the reprex package (v2.0.1)