python-3.x web-scraping python-requests http-status-code-404 urllib

Web-scraping Yell.com in Python

After reading a LOT, I have tried to do my first step in web scraping at yell website with urllib and requests but I get the same in both cases (404 not found).

The url is:

url = https://www.yell.com/

What I have tried:

urllib package

import urllib.request
f = urllib.request.urlopen(url)
print(f.read(100))

and

import urllib.request
opener = urllib.request.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
opener.open(url)

requests package

url = 'www.yell.com'
response = requests.get(url)

and

headers = {'Accept': 'text/html'}
response = requests.get(url, headers=headers)

But i reach to the 404 error.

Solution

Try this using urllib

import urllib.request

url = 'https://www.yell.com/'
headers = { 'User-Agent' : 'Mozilla/5.0 (Windows NT 6.1; Win64; x64)' }
request = urllib.request.Request(url, headers=headers)
response = urllib.request.urlopen(request)

print(response.read())

I would suggest you to use requests + beautifulsoup4 https://www.crummy.com/software/BeautifulSoup/bs4/doc/ it will make your scraping life easier