Search code examples
pythonsslhttps

Why are HTTPS response bodies logged as bytes in my MITM proxy script?


I'm using a Python script to create a man-in-the-middle (MITM) proxy for intercepting HTTPS traffic. The script captures and logs the requests and responses to a log file. While the headers of the HTTPS requests and responses are readable as plain text, the bodies are logged as byte strings, making them difficult to interpret.

Here’s the relevant part of my script that handles the HTTPS request and response:

def relay_data(self, s_ssl, conn_ssl, buffer_size, url):
    website_response = []
    while True:
        try:
            request = conn_ssl.recv(buffer_size)
            if not request:
                break
            s_ssl.sendall(request)
        except socket.error:
            pass
        try:
            response = s_ssl.recv(buffer_size)
            if not response:
                break
            conn_ssl.sendall(response)
            if response:
               website_response.append(response)
        except socket.error:
            pass
    if website_response:
       log_response_data(website_response, "https")
    print(f"Request completed (HTTPS) [{url}]")

def log_response_data(website_response, protocol):
    formatted_response, demarcation = "", "____________________________________________________________________________________________________"
    for response in website_response:
        try:
           formatted_response += response.decode('utf-8')
        except:
           formatted_response += response
    if formatted_response:
       with open(f"{protocol}_log_file", "a") as F:
            F.write(f"{demarcation}\n{formatted_response}\n{demarcation}\n\n")

The relay_data function collects the HTTPS responses in the website_response list and then attempts to log them using log_response_data. The logging function tries to decode the responses into UTF-8 format. However, since HTTPS response bodies often contain binary data (e.g., images, files, encrypted content), the decoding fails, and the log ends up containing raw byte strings as follows(the below request is truncated):

______________________________________________________________________________________________________________________
b'HTTP/1.1 200 OK
Vary: Accept-Encoding
Content-Encoding: br
Content-Type: text/css; charset=utf-8
Access-Control-Allow-Origin: *
Last-Modified: Mon, 01 Jan 2001 08:00:00 GMT
Expires: Mon, 07 Jul 2025 16:39:40 GMT
Cache-Control: public,max-age=31536000,immutable
reporting-endpoints: permissions_policy="https://www.xx.facebook.com/ajax/browser_error_reports/"
timing-allow-origin: *
document-policy: force-load-at-top
permissions-policy: accelerometer=(), attribution-reporting=(), autoplay=(), battery=(self), bluetooth=(), camera=(), ch-device-memory=(), ch-downlink=(), ch-dpr=(), ch-ect=(), ch-rtt=(), ch-save-data=(), ch-ua-arch=(), ch-ua-bitness=(), ch-viewport-height=(), ch-viewport-width=(), ch-width=(), clipboard-read=(), clipboard-write=(), compute-pressure=(), display-capture=(), encrypted-media=(), fullscreen=(self), gamepad=(), geolocation=(), gyroscope=(), hid=(), idle-detection=(), interest-cohort=(), keyboard-map=(), local-fonts=(), magnetometer=(), microphone=(), midi=(), otp-credentials=(), payment=(), picture-in-picture=(), private-state-token-issuance=(), publickey-credentials-get=(), screen-wake-lock=(), serial=(), shared-storage=(), shared-storage-select-url=(), private-state-token-redemption=(), usb=(), usb-unrestricted=(), unload=(self), window-management=(), xr-spatial-tracking=();report-to="permissions_policy"
cross-origin-resource-policy: cross-origin
X-Content-Type-Options: nosniff
report-to: {"max_age":21600,"endpoints":[{"url":"https:\/\/www.xx.facebook.com\/ajax\/browser_error_reports\/"}],"group":"permissions_policy"}
content-md5: yBQVMFwk1cIEUVIB4cPFJw==
X-FB-Debug: FVUkfTRg3dvvwbDFhD8Xj5Bxk8qMudZ3UR/oe+x9HEIHxg+Wh2aQAJqhqUReqxiHUgi/KHRKaUjPsQ3bnnIrkg==
Date: Mon, 08 Jul 2024 11:33:02 GMT
X-FB-Connection-Quality: MODERATE; q=0.3, rtt=152, rtx=0, c=13, mss=1368, tbw=2569, tp=-1, tpl=-1, uplat=1, ullat=-1
Alt-Svc: h3=":443"; ma=86400
Connection: keep-alive
Content-Length: 10118

'b'\xe2'b'\x1d\x96\x88\xa2>\x04(B\x86\xb9G\x7fi\xdf\x7f~\xbe,U\xa36\xfb\xc0\xe3\x1b\x1b\xa4j\xa7\x9d\xe9\xbc\xee\xf6\xda\xc9\x1e72\xd8$N\xcc\x11s\x84\x14q\xbd\x99\xf6\xa64\x00\xe4\xdc\x99\xec\xe7\xb2\x95+\"\x00\xca\xf9L\xa9\xf1A\xc26\xef\xd5\xcd\xec\x02U\x0bW\x05\xba\xaa#\xbfs\xb4\x92\xef\xd7\xdd3\xdc}3\xa0\xb0\x86\x12\xb9\xcbc\x1d\x16<\xe3\xe9d\x9cM\x7f\x96\xc8\xd9TY|A\xa8R\x12)HB=\x86\xea\xcb\xdeo\x14\t\"\xf2\xcc\xb6e\xb8l\xf4d\xe2\xf7H\xb0@\x91\x90\xf0\xbbp\xd3\xe8\xe9\xe0\xaf\x050\xbfz\xb6\x94\xdcd\xeb\xadq\x92\x80xx\x17\x86\xb6K\xd7\xa6`\xa11r\x01uU\xc5+Q%\xb3\xf4\xe9\x14\x81\xee*A\x93\xbc\xa3\x8c\x97+\xec\x8e}\xda\xbbg\x8bv\x16\x04?*E\x80v3\x90\xb7?!\xec\x0b}\x87\"\xb0\x9d\x83t\xdb{\xb7\x11!\xe8\xc23c\x91^\xfc\xe1\xc1\xceC\xcc\xa3\x9d\xb4\xc3\x10\xc5-\xb5\xe0\xd5K\xaa\xcf\x03\xd9\xff<\xf0\xe1\xde}\xdat\x04%\x14\xc01L}\x1c`\xef\xc3\xe8b8\xf2\xcf\xc5\x02\t6J/ \x03\x9e9\t\x1e/<R\xb3(\xd8\x12\x12\xd4q\xf6\xbb\'\x00\x16\xd6v6\xbc\x8d\xc5\xc2\x8a%Y\x9a\x83\xa5\x8ad\xfb\xd9(\x16r\xa6t\xc2^\xee\xa51\xde\xd8m\x05\x1d/\x84DA\xc4\xc0Z\xd6\xdb x\xff\x1e\xd6\xcd\x06\xe6\x1c\x8fb\xa1\xc2F\xcc\x90\xf90\xc3c\x15}\xbb\xb5_\xd8j0e\xd6\xd0)}\x81\x9c:\xf30\xac\x8f\x01\xe8\xea}Sx\xc8\x03\xb8\x86X<\xa3*)4\xac\x08\xce\xc2\xde\xa8\x87\x1d5I7\xd7\xdf\xde\xdc\x88\xf7\xdf\x03\xc0<>\x15\xb0\xd6e7\xd2\x98\xb9$\xfd\xdd\xfa\xc9\xbe_\x0f\xde\xech#p\xd9/\x15\x81o\xc1i\x1f\x01\xf9\x99]\x01\x0e\xf7\xd5FC\xec\xb2xt\x11\x88\xa5E\'\xbe\xf4w\x8b\xc0\x83w\xcd\xe9U\x97\xbb\xfb\xdf\xf9\xa9m\x86\x08\xdc\xc2\xdd\xd3\xdb\xee\xaf|\xfc\xcb\xfe\xb2\xfb\xd1\xebp\xdb`\xf7\x8f\xe0\x87K\xdb\xd0\xdf\xf2vg\xf7\xad\x05\x7f{\x06\xbf[\xe1\xb2\xdb\xac\x0fe*\x016\xeen\x11\xbf\xe4v\x9b\x8d\x8d\x1b\xf0e,L\xfe\x9b\x97\xfe\xed7NQ\xcc+\x16\x81\xbf\xfd\x86\x91F1F\xd5+\x96!\xe93?'........
______________________________________________________________________________________________________________________

My questions are:

  1. Why do the HTTPS response bodies appear as bytes in my log file?
  2. How can I decode/decypt the bytes data in my log file?
  3. Should I distinguish between different content types and handle them separately? If so, how can I implement this in my script?

Any guidance on improving the readability of my HTTPS response logs would be greatly appreciated!


Solution

  • Content-Encoding: br
    

    The content of the body is not "logged as bytes" but it is simply compressed data, using the br encryption (brotli) as can be seen from the HTTP response header.

    If you want to have clear data here you either need to decompress the body or you need to prevent the compression in the first place. The latter can be done by not passing through the original request header from the client but by replacing the Accept-Encoding field with one which only allows identity. This tells the server that the client does not support content compression and a well behaving server will then send the data uncompressed.