Search code examples
pythonparsingurlliburldecode

python3 bytes replace %3D by = in the end of file as fast as it possible


I have a bytes object which is actually the file in the format of dataurl. It is about 500 KB.

I need to drop 37 bytes of header (I made it using a slice) and replace %3D by = at the end of the file (this sequence can be found 0-2 times).

Urllib.parse changes all entries in the object.

Is there a beautiful way to process this object?

    content_length = int(self.headers['Content-Length']) # <--- Gets the size of data
    post_body = self.rfile.read(content_length) # <--- Gets the data itself
    print(len(post_body))
    with open("1111", "wb") as fd:
        fd.write(post_body)

    post_body = post_body[37:len(post_body)]

    with open("decoded.png", "wb") as fh:
        fh.write(base64.decodebytes(post_body))

In the last line, I have a problem.

= characters might be added to make the last block contain four base64 characters. But in the post request, I have %3D instead of =.


Solution

  • It seems to me that you need to "unquote" the url escaped (%xx) symbols.

    Python has a function for this, in python2.7 it is urllib.unquote, in python3 it is urllib.parse.unquote. Sample usage would be:

    from urllib.parse import unquote
    
    post_body = unquote(post_body[37:])
      # my_list[i:] is short for my_list[i:len(my_list)]
    

    However, I don't know if you may only want to apply it to the last bytes, or only apply if the bytes end with %3D... for which you can use .endswith() that works for strings and bytes the same:

    my_bytes.endswith('%3D')