Search code examples
pythonasync-awaitpython-asyncioaiohttp

aiohttp showing 403 Forbidden but requests.get giving 200 OK response


I'm using aiohttp to asynchronously scrap some price from an url. Before, I used requests.get to synchronously do the same. I am able successfully able to scrap using requests.get but the same URL throws 403 forbidden error when I'm trying to do it using aiohttp. I try to find what could be the issue but I haven't got any success so far. The URL is important because that site's URL are getting this 403 error.

I tried to disable the behavior that aiohttp normalize url using yarl.URL with encoded=True but it still don't work...

import requests
import asyncio
from aiohttp import ClientSession
import yarl

url = 'https://www.yescapa.fr/s?seatbelts=4&beds=4&km_unlimited=true&less_than_five=true&cooking=true&sink=true&fridge=true&wc=true&heating=true&types=4&longitude=-0.58046&latitude=44.84135&radius=50000&date_from=2024-08-01&date_to=2024-08-29&page=1'

res = requests.get(url)
print(res.status_code) # getting a 200 RESPONSE

async def test(url):
    async with ClientSession() as session:
        url = yarl.URL(url, encoded=True)
        async with session.request(method="GET", url=url) as response:
            return response.status # getting à 403 RESPONSE

print(asyncio.run(test(url)))

What am i doing wrong ?

I hope I get the solution. Thanks.


Solution

  • Looks like those two (requests and aiohttp) use different headers. If i copy headers from the successful request, it works:

    import requests
    import asyncio
    from aiohttp import ClientSession
    import yarl
    
    url = 'https://www.yescapa.fr/s?seatbelts=4&beds=4&km_unlimited=true&less_than_five=true&cooking=true&sink=true&fridge=true&wc=true&heating=true&types=4&longitude=-0.58046&latitude=44.84135&radius=50000&date_from=2024-08-01&date_to=2024-08-29&page=1'
    
    res = requests.get(url)
    print(res.status_code)  # getting a 200 RESPONSE
    
    headers = {
        'User-Agent': res.request.headers['User-Agent'],
        'Accept': res.request.headers['Accept'],
        'Accept-Encoding': res.request.headers['Accept-Encoding'],
        'Connection': 'keep-alive',
    }
    
    async def test(url):
        async with ClientSession(headers=headers) as session:
            url = yarl.URL(url, encoded=True)
            async with session.request(method="GET", url=url) as response:
                return response.status  # getting a 200 RESPONSE now as well
    
    print(asyncio.run(test(url)))