Search code examples
pythonregexcookiesrfc6265

Regex for parsing set-cookie headers


i try to parse set-cookie headers with regex in Python. For the set-cookie header i read the RFC 6265 Section 4.1 that describe how to build the set-cookie header. I try to build a regex from the specification and this is my current state:

([\x21\x23-\x27\x2A\x2B\x2D-\x39\x41-\x5A\x5E-\x7A\x7C\x7E]+)=[\x21\x23-\x2B\x2D-\x3A\x3C-\x5B\x5D-\x7E]*(;[\x20](((Expires|expires)=(Mon|Tue|Wed|Thu|Fri|Sat|Sun),[\x20][0-9]{2}-(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)-[0-9]{4}[\x20][0-9]{2}:[0-9]{2}:[0-9]{2}[\x20]GMT)|((Max-Age|max-age)=[1-9]+)|((Path|path)=[\x20-\x3A\x3C-\x7E]+)|(Secure|secure)|(HttpOnly|httponly)|([\x20-\x3A\x3C-\x7E]*)))*

I have problems with the recursive definition of the subdomain in the set-cookie header (domain=...), that describes in RFC 1034 Section 3.5 and need help to frame that in regex.

But also my previous code work not expected completely. For example this set-cookie header

VISITOR_INFO1_LIVE=M_6WYFFF_fo; path=/; domain=.youtube.com; secure; expires=Tue, 07-Jul-2020 00:17:35 GMT; httponly; samesite=None, GPS=1; path=/; domain=.youtube.com; expires=Thu, 09-Jan-2020 00:47:35 GMT, YSC=8sXes3YfFFF; path=/; domain=.youtube.com; httponly, VISITOR_INFO1_LIVE=M_6WYFFF_fo; path=/; domain=.youtube.com; secure; expires=Tue, 07-Jul-2020 00:17:35 GMT; httponly; samesite=None

includes 4 cookies (VISITOR_INFO1_LIVE twice, GPS and YSC) but my regex only catch 3 cookies (the YSC cookie is missing). I test that on https://regex101.com/

Later i would parse many set-cookie headers to get the name of the cookies (or in the RFC calls that cookie-name).

Thanks for help!


Solution

  • Short answer, as you asked how to parse the cookies with regex:

    ([^;]+);?
    

    Then iterate through the matches.

    The way you have formulated the question indicates that you would also like to validate the cookies and probably also separate them.