Search code examples
pythonweb-scrapingbeautifulsouppython-requestsscreen-scraping

Error 404 with Beautifulsoup only in some urls within a site


I've been learning scraping with python and beautifulsoup, but i recently ran into an issue when requesting the second page of results within a site.

Requesting the first page with this code works correctly:

url = "https://PAGE_1_URL_HERE"
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.87 Safari/537.36'
headers = {'User-Agent': user_agent}
response = requests.get(url, headers=headers)
html = response.content
soup = BeautifulSoup(html, features="html.parser")

print(response)

But attempting this in the second page with the same code returns a 404.

url = "https://PAGE_2_URL_HERE"
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.87 Safari/537.36'
headers = {'User-Agent': user_agent}
response = requests.get(url, headers=headers)
html = response.content
soup = BeautifulSoup(html, features="html.parser")

print(response)

I've tried different headers but i haven't been able to solve this and i would be very grateful if anyone knew of a solution.


Solution

  • here an example , just you need add your cookie browser...

    from bs4 import BeautifulSoup
    import requests
    
    
    url = "https://PAGE_2_URL_HERE"
    user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.87 Safari/537.36'
    headers = {'User-Agent': user_agent}
    cookies = {"cookie":"COPY_HERE_YOUR_COOKIE_FROM_BROWSER"}
    response = requests.get(url, headers=headers , cookies=cookies)
    #print(response.text)
    print(response)
    html = response.content
    soup = BeautifulSoup(html, features="html.parser")
    print(response)