I'm writing a web crawler in aiohttp and experiencing a problem with cookies. Server I'm trying to crawl requires authentication and in order to fetch pages available to authenticated users I need to set a cookie with brackets in the key itself. This is a problem as aiohttp.ClientSession.cookie_jar.update_cookies
either ignores any illegal cookies:
session = ClientSession()
cookie = SimpleCookie("a[b]=1234;")
session.cookie_jar.update_cookies(cookie)
print([f for f in session.cookie_jar]) # empty list, cookie not set
or raises a CookieError
:
session = ClientSession()
cookie = SimpleCookie()
cookie["a[b]"] = "1234" # http.cookies.CookieError: Illegal key 'a[b]'
session.cookie_jar.update_cookies(cookie)
print([f for f in session.cookie_jar])
session = ClientSession()
session.cookie_jar.update_cookies([("a[b]", "1234")]) # http.cookies.CookieError: Illegal key 'a[b]'
print([f for f in session.cookie_jar])
It is possible to force setting the cookie by accessing http.cookies.Morsel
's protected member _key
, i.e.
session = ClientSession()
session.cookie_jar.update_cookies([("__tmp", "1234")])
for cookie in session.cookie_jar:
if cookie.key == "__tmp":
cookie._key = "a[b]"
print([f for f in session.cookie_jar]) # invalid cookie is set correctly
but this only pushes the problem one step back, as any session request e.g. session.get(url)
starts raising http.cookies.CookieError
.
I cannot get around sending this cookie. Am I stuck using non async libraries like requests
or is there a way to ignore this issue?
I found a workaround, and while I dislike using it, it was preferred solution over rewriting entire aiohttp
:
import sys
if "http" in sys.modules:
raise ImportError("Crawler must be imported before http module")
import http.cookies
http.cookies._is_legal_key = lambda _: True