Search code examples
python-3.xweb-scrapingpython-requestshttp-status-code-404urllib

Web-scraping Yell.com in Python


After reading a LOT, I have tried to do my first step in web scraping at yell website with urllib and requests but I get the same in both cases (404 not found).

The url is:

url = https://www.yell.com/

What I have tried:

  • urllib package
import urllib.request
f = urllib.request.urlopen(url)
print(f.read(100))

and

import urllib.request
opener = urllib.request.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
opener.open(url)
  • requests package
url = 'www.yell.com'
response = requests.get(url)

and

headers = {'Accept': 'text/html'}
response = requests.get(url, headers=headers)

But i reach to the 404 error.


Solution

  • Try this using urllib

    import urllib.request
    
    url = 'https://www.yell.com/'
    headers = { 'User-Agent' : 'Mozilla/5.0 (Windows NT 6.1; Win64; x64)' }
    request = urllib.request.Request(url, headers=headers)
    response = urllib.request.urlopen(request)
    
    print(response.read())
    

    I would suggest you to use requests + beautifulsoup4 https://www.crummy.com/software/BeautifulSoup/bs4/doc/ it will make your scraping life easier