I am trying to scrape reviews from Imdb movies using python3.6. However when I print my 'review', only 1 review pops up and I am not sure why the rest does not pop up. This does not happen for my 'review_title'. Any advise or help is greatly appreciated as I've been searching forums and googling but no avail.
from urllib.request import urlopen
from bs4 import BeautifulSoup
import re
url = urlopen('http://www.imdb.com/title/tt0111161/reviews?ref_=tt_ov_rt').read()
soup = BeautifulSoup(url,"html.parser")
print(soup.prettify())
review_title = soup.find("div",attrs={"class":"lister"}).findAll("div",{"class":"title"})
review = soup.find("div",attrs={"class":"text"})
review = soup.find("div",attrs={"class":"text"}).findAll("div",{"class":"text"})
rating = soup.find("span",attrs={"class":"rating-other-user-rating"}).findAll("span")
Without creating any loop how can you reach all the content of that page? The way you have written your script is exactly doing what it is supposed to do (parsing the single review content).Try the below way instead. It will fetch you all the visible data.
from urllib.request import urlopen
from bs4 import BeautifulSoup
url = urlopen('http://www.imdb.com/title/tt0111161/reviews?ref_=tt_ov_rt').read()
soup = BeautifulSoup(url,"html.parser")
for item in soup.find_all(class_="review-container"):
review_title = item.find(class_="title").text
review = item.find(class_="text").text
try:
rating = item.find(class_="point-scale").previous_sibling.text
except:
rating = ""
print("Title: {}\nReview: {}\nRating: {}\n".format(review_title,review,rating))