Search code examples
python-2.7runtime-errorurllib2

Unable to scrape a particular site using python, urllib2


I am using urllib for extracting the web page, and it doesn't seem to work with the site of zomato.It works with several other sites that I tried. I have tried disabling my firewall, antivirus, and wrap the urlopen line in try catch as I found it here. I have tried using codes from the internet that seem to work fine, but I get an error message,

v = self._sslobj.read(len or 1024) socket.error: [Errno 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.

What can be the possible reasons?

Here is the simple code for it.

import urllib2
from bs4 import BeautifulSoup

def extract_link(url):            
    page = urllib2.urlopen(url).read()                    
    return BeautifulSoup(page)

def main():
    link = 'https://www.zomato.com/kolkata'
    soup=extract_link(link)
    print soup.prettify()

if __name__== '__main__':
    main()

Solution

  • Since zomato.com is a dynamic website and relies heavily on Javascript or AJAX to fill in its content, you need a browser in order to get that content. On top of that, zomato filters out requests which it sees as being made by bots. So, for these reasons, I would recommend you to use Selenium as it doesn't simulate a browser session, it is a browser session. Writing for selenium is basically writing a set of actions and feeding them to a browser (usually Firefox, but it can work with others). You could also use PhantomJs as an alternative since you are working with python.

    Here is a little bit of help in setting up selenium and replicating your above code.

    1. Download chromedriver.exe from https://chromedriver.storage.googleapis.com/index.html?path=2.25/
    2. Then extract this .exe file into "Chrome Driver" folder.
    3. Then cut and paste the folder containing the file into your C drive.
    4. Now, do pip install selenium.

    Then run this code

    import time
    from selenium import webdriver
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
    driver = webdriver.Chrome("C:\Chrome Driver\chromedriver")
    driver.get("https://www.zomato.com/kolkata")
    time.sleep(3)
    
    wait = WebDriverWait(driver, 30)
    
    body = wait.until(EC.presence_of_element_located((By.XPATH,"/html/body")))
    print(body.text)
    

    This is the output you will get :

    We're Hiring!
    Order Food Online!
    Log in
    Find the best restaurants, cafés, and bars in Kolkata
    Kolkata
    Search
    Collections
    Explore curated lists of top restaurants, cafes, pubs, and bars in and around Kolkata, based on trends
    Trending this week
    The most popular restaurants in town this week
    New restaurants you probably haven't tried yet
    The best new places in town
    Happy hours
    Great deals on booze. Happy hours indeed
    Hookah bars
    Great places to enjoy flavored Hookah.
    All collections in Kolkata
    Order Food Online
    From the best restaurants delivering to your doorstep
    Enter your delivery location
    Detect
    Order Food Online!
    Get 15% off (up to Rs. 100) on your first order with the code EATIN when you pay online.
    Quick Searches
    Discover restaurants by type of meal
    Delivery
    Breakfast
    Lunch
    Dinner
    Drinks & Nightlife
    Cafés
    Pocket-Friendly Delivery
    Desserts & Bakes
    Popular localities in and around Kolkata
    Explore restaurants, bars, and cafés by locality
    Park Street Area (111 places)
    Sector 5, Salt Lake (102 places)
    Ballygunge (154 places)
    Sector 1, Salt Lake (167 places)
    Rajarhat New Town (212 places)
    Southern Avenue (86 places)
    Elgin (81 places)
    Prince Anwar Shah Road (108 places)
    Kankurgachi (116 places)
    Kasba (118 places)
    Camac Street Area (61 places)
    Gariahat (62 places)
    Park Circus Area (80 places)
    Desapriya Park (61 places)
    New Market Area (105 places)
    Behala (136 places)
    Hindustan Park (30 places)
    Hatibagan (70 places)
    Sector 3, Salt Lake (83 places)
    Esplanade (64 places)
    Jadavpur (91 places)
    Golpark (37 places)
    Bhawanipur (100 places)
    Science City Area (19 places)
    Theatre Road (47 places)
    Shyam Bazar (54 places)
    Garia (108 places)
    Tangra (32 places)
    Nagerbazar (60 places)
    Tollygunge (110 places)
    Top Reviewers in Kolkata
    Pamela Nandi
    446 Reviews , 245 Followers
    Follow
    Dipyaman Basu
    883 Reviews , 5854 Followers
    Follow
    Anusreea Paul
    613 Reviews , 2114 Followers
    Follow
    Krishanu Das
    832 Reviews , 1599 Followers
    Follow
    Priyabrataa Ganguly
    536 Reviews , 683 Followers
    Follow
    Reviewers leaderboard »
    Top Photographers in Kolkata
    Avijit Biswas
    698 Reviews , 1381 Followers
    Follow
    Krishanu Das
    832 Reviews , 1599 Followers
    Follow
    Mani
    153 Reviews , 4473 Followers
    Follow
    Anusreea Paul
    613 Reviews , 2114 Followers
    Follow
    Anamitraa Chakraborty
    610 Reviews , 1255 Followers
    Follow
    Photographers leaderboard »
    Top Bloggers in Kolkata
    Arghya Deep
    137 Reviews , 3498 Followers
    Follow
    Subham Ghosh
    208 Reviews , 6497 Followers
    Follow
    Snehasis
    230 Reviews , 929 Followers
    Follow
    Koninika De
    356 Reviews , 4695 Followers
    Follow
    Dipyaman Basu
    883 Reviews , 5854 Followers
    Follow
    Bloggers leaderboard»
    Looking for the Food Feed? Get the app!
    Follow foodies to see their reviews and photos in your Feed, and discover great new restaurants!
    We'll send you a link, open it on your phone to download the app
     Text App Link 
    OR
    Email App Link
    23
    COUNTRIES
    1.2M
    RESTAURANTS
    80M
    FOODIES EVERY MONTH
    30M
    PHOTOS
    10M
    REVIEWS
    18M
    BOOKMARKS
    About Careers Culture Mobile Apps Businesses Developers Blog Community Contact
    English
    Businesses
    Add a Restaurant
    Claim your Listing
    Business App
    Restaurant Widgets
    Guidelines
    Business Blog
    Advertise
    Book
    Order
    Base
    Whitelabel
    Countries
    Australia
    Brasil
    Canada
    Chile
    Czech Republic
    India
    Indonesia
    Ireland
    Italy
    Lebanon
    Malaysia
    New Zealand
    Philippines
    Poland
    Portugal
    Qatar
    Slovakia
    South Africa
    Sri Lanka
    Turkey
    UAE
    United Kingdom
    United States
    Privacy Terms Code of Conduct API Policy CSR Security Sitemap
    By continuing past this page, you agree to our Terms of Service, Cookie Policy, Privacy Policy and Content Policies. All trademarks are properties of their respective owners. © 2008-2016 - Zomato™ Media Pvt Ltd. All rights reserved.