i try to parse set-cookie headers with regex in Python. For the set-cookie header i read the RFC 6265 Section 4.1 that describe how to build the set-cookie header. I try to build a regex from the specification and this is my current state:
([\x21\x23-\x27\x2A\x2B\x2D-\x39\x41-\x5A\x5E-\x7A\x7C\x7E]+)=[\x21\x23-\x2B\x2D-\x3A\x3C-\x5B\x5D-\x7E]*(;[\x20](((Expires|expires)=(Mon|Tue|Wed|Thu|Fri|Sat|Sun),[\x20][0-9]{2}-(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)-[0-9]{4}[\x20][0-9]{2}:[0-9]{2}:[0-9]{2}[\x20]GMT)|((Max-Age|max-age)=[1-9]+)|((Path|path)=[\x20-\x3A\x3C-\x7E]+)|(Secure|secure)|(HttpOnly|httponly)|([\x20-\x3A\x3C-\x7E]*)))*
I have problems with the recursive definition of the subdomain in the set-cookie header (domain=...
), that describes in RFC 1034 Section 3.5 and need help to frame that in regex.
But also my previous code work not expected completely. For example this set-cookie header
VISITOR_INFO1_LIVE=M_6WYFFF_fo; path=/; domain=.youtube.com; secure; expires=Tue, 07-Jul-2020 00:17:35 GMT; httponly; samesite=None, GPS=1; path=/; domain=.youtube.com; expires=Thu, 09-Jan-2020 00:47:35 GMT, YSC=8sXes3YfFFF; path=/; domain=.youtube.com; httponly, VISITOR_INFO1_LIVE=M_6WYFFF_fo; path=/; domain=.youtube.com; secure; expires=Tue, 07-Jul-2020 00:17:35 GMT; httponly; samesite=None
includes 4 cookies (VISITOR_INFO1_LIVE
twice, GPS
and YSC
) but my regex only catch 3 cookies (the YSC
cookie is missing). I test that on https://regex101.com/
Later i would parse many set-cookie headers to get the name of the cookies (or in the RFC calls that cookie-name).
Thanks for help!
Short answer, as you asked how to parse the cookies with regex:
([^;]+);?
Then iterate through the matches.
The way you have formulated the question indicates that you would also like to validate the cookies and probably also separate them.