After reading a LOT, I have tried to do my first step in web scraping at yell website with urllib and requests but I get the same in both cases (404 not found).
The url is:
url = https://www.yell.com/
What I have tried:
import urllib.request
f = urllib.request.urlopen(url)
print(f.read(100))
and
import urllib.request
opener = urllib.request.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
opener.open(url)
url = 'www.yell.com'
response = requests.get(url)
and
headers = {'Accept': 'text/html'}
response = requests.get(url, headers=headers)
But i reach to the 404 error.
Try this using urllib
import urllib.request
url = 'https://www.yell.com/'
headers = { 'User-Agent' : 'Mozilla/5.0 (Windows NT 6.1; Win64; x64)' }
request = urllib.request.Request(url, headers=headers)
response = urllib.request.urlopen(request)
print(response.read())
I would suggest you to use requests + beautifulsoup4 https://www.crummy.com/software/BeautifulSoup/bs4/doc/ it will make your scraping life easier