Search code examples
python-requestsaiohttp

requests.Session() is processing cookie in unexpected way (mangling json)


I populate a session cookie server side on the response to a client request. Over the wire the response looks like the below - you can see mycookie has a json format with escaped quotes:

21:13:54.006488 IP (tos 0x0, ttl 64, id 45515, offset 0, flags [DF], proto TCP (6), length 303, bad cksum 0 (->89fb)!)
    localhost.http-alt > localhost.57738: Flags [P.], cksum 0xff23 (incorrect -> 0x13f5), seq 1:252, ack 247, win 12751, options [nop,nop,TS val 1223327230 ecr 1223325750], length 251
    0x0000:  4500 012f b1cb 4000 4006 0000 7f00 0001  E../..@.@.......
    0x0010:  7f00 0001 1f90 e18a e6ce bb1d 282c d580  ............(,..
    0x0020:  8018 31cf ff23 0000 0101 080a 48ea 7dfe  ..1..#......H.}.
    0x0030:  48ea 7836 4854 5450 2f31 2e31 2032 3030  H.x6HTTP/1.1.200
    0x0040:  204f 4b0d 0a43 6f6e 7465 6e74 2d54 7970  .OK..Content-Typ
    0x0050:  653a 2061 7070 6c69 6361 7469 6f6e 2f6a  e:.application/j
    0x0060:  736f 6e0d 0a43 6f6e 7465 6e74 2d4c 656e  son..Content-Len
    0x0070:  6774 683a 2032 330d 0a53 6574 2d43 6f6f  gth:.23..Set-Coo
    0x0080:  6b69 653a 2070 6965 6b61 726d 613d 227b  kie:.mycookie="{
    0x0090:  5c22 6372 6561 7465 645c 223a 2031 3438  \"created\":.148
    0x00a0:  3132 3331 3633 325c 3035 3420 5c22 7365  1231632\054.\"se
    0x00b0:  7373 696f 6e5c 223a 207b 5c22 7573 6572  ssion\":.{\"user
    0x00c0:  5c22 3a20 5c22 686c 6565 6e65 795c 227d  \":.\"my_name\"}
    0x00d0:  7d22 3b20 4874 7470 4f6e 6c79 3b20 5061  }";.HttpOnly;.Pa
    0x00e0:  7468 3d2f 0d0a 4461 7465 3a20 5468 752c  th=/..Date:.Thu,
    0x00f0:  2030 3820 4465 6320 3230 3136 2032 313a  .08.Dec.2016.21:
    0x0100:  3133 3a35 3120 474d 540d 0a53 6572 7665  13:51.GMT..Serve
    0x0110:  723a 2050 7974 686f 6e2f 332e 3420 6169  r:.Python/3.4.ai
    0x0120:  6f68 7474 702f 312e 312e 360d 0a0d 0a    ohttp/1.1.6....

I use the the following requests code to get the cookie:

with requests.Session() as s:
    r = s.post(domain+'login')
    c = s.cookies['mycookie']

And c looks like '"{created: 1481233488\054 session: {user: hleeney}}"'

c[0] is "

I'm using aiohttp on the server side ..

response = web.Response(...)
response.set_cookie(json.dumps({"session":{...}}))

I'm not sure who to blame :D Can anyone help?


Solution

  • To answer 'who to blame' here is difficult. Its probably the user (me) for being a bit ignorant and if he was smarter he would never have run into this problem. But, it could also any of the below depending on your point of view. Its an interesting case-study in software development life cycle and standards.

    1)The authors of requests: It is indeed a funny line of code in the requests library that was mangling the JSON. At the time of writing it is overriding code from http/cookies.py to modify cookie values before returning them across the API. Now, the requests guys are really helpful and very cool. They acknowledge this flaw/sub-optimal implementation although from one perspective it is not really defying RFC 6265 (which supposedly standardises cookie values). Now the flaw is probably supporting a 'feature' for compatibility with some server side cookie code somewhere (my take on it). The module in which the flaw exists is earmarked for obsolescence so a fix and potential interim backwards compatibility issue at a minor version number is fairly deemed undesirable and a waste.

    2) The authors of aiohtto_session: Well gosh darn these are the guys who are putting JSON into the cookie value!!! They are at fault... aren't they? Well, its again complex. Their intent is to provide a simple API for secure sessions using aiohttp as a server. They provide a few implementations. One that is intended for live use is an encrypted cookie that stores session data in an encrypted JSON string. When it is encrypted there are no issues encoding/decoding the cookie. Of course the cookie is not intended for reading on the client side so it never exists there as JSON and JSON never gets transmitted. Another implementation they provide is a 'Simple' session storage. Here they forego the encryption and transmit the session as a raw JSON string. This is problematic because JSON isn't really supposed to be transmitted in a cookie value (see 3 below). However the simple session storage is only meant for testing not for live.. still might be better to provide a simple storage that doesn't potentially blow up other API's but actually having that implementation (JSON without the encryption) probably provides some valuable test coverage scenarios.

    3)The authors of RFC 6265: This RFC was supposed to be definitive in specifying the cookie standard. It sure is better than what preexisted. But I'm not convinced its definitive. The spec for cookie value is just a bit weird and picky IMHO. For one, the english below is open to a slight misinterpretation, for two there seems to be a typo in the omission of a comma and for three .. well again its weird and picky IMHO (H here stands for ignorant because I'm more sure they know why it makes sense)

    cookie-value = *cookie-octet / ( DQUOTE *cookie-octet DQUOTE ) cookie-octet = %x21 / %x23-2B / %x2D-3A / %x3C-5B / %x5D-7E ; US-ASCII characters excluding CTLs, ; whitespace DQUOTE, comma, semicolon, ; and backslash With the way things are these days storing json in a cookie does not sound crazy and people may want to do it notwithstanding potential security holes. The python HTTP APIs seem to me to turn a blind eye to non compliant cooke values - they escape DQUOTE and send backslashes with them. Anyhow, not my soap box.

    4) The user: Me. Well starting out on this journey I was ignorant of all of the above. The cookie standards and their history, python http implementation, requests implementation, aiohttp_session implementation. I was needlessly testing the plaintext value of the cookie on the client side .. although someone may have a genuine reason for doing this in the future. I kinda randomly selected requests to do the client side stuff too and so deserve to have had to delve into the source there.

    So in closing and in jest I blame puny humanity for this one for being smart enough to create complexity but not smart enough to not avoid SDLC problems like this.