I'm very new on doing web scraping. I want to do a scraping on coches.net web to do some funny data analysis exercice, but the following code returns always a 403 response.
import requests
from bs4 import BeautifulSoup
import time
headers={'User-Agent':'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36'}
base_url = 'https://www.coches.net/segunda-mano/?pg={}&st=1'
for counter in range(1,80):
url = base_url.format(counter)
# Get links
response = requests.get(url)
print (response.status_code)
soup = BeautifulSoup(response.content, "html.parser")
blocks = soup.select('.mt-Card-body')
print (blocks)
time.sleep(1)
I've been looking at some web pages (indeed my code is strongly inspired by what I've found so far) and it seems like my code should be ok. Any help? How can I avoid the 403 response? Is it because of my code or just coches.net doesn't allow python scripts to acces?
You have create headers but don't use them.
Try to use your user agent and you will have 200
status code
response = requests.get(url, headers=headers)
If I help you - please mark answer as correct