Search code examples
pythonbase64python-imaging-librarywiresharkbmp

Strange base64 python decoding


From a stream of TCP segments, I pulled binary data with the help of wireshark, which I later found out is a bmp file. Then I load one big line of binary data, clear it from spaces, newline character and intermediate "=". I mean their separation at the end of each TCP segment.

Then I execute the following code:

import base64

tst = sec_orig.replace('\n', '').replace(' ', '').replace('=', '')

decoded_data = base64.b64decode(tst + '=', altchars=None, validate=True)

Raw data from wireshark:

Qk02MAEAAAAAADYEAAAoAAAAQAEAAPAAAAABAAgAAAAAAAAAAABCCwAAQgsAAAABAAAAAQAAAAAA
AAAAAAAAAKAAAPAAAPAAAAAA/PwA/PwAAPxw/AD8/PwAoKCgAEBAQABQMAAAWFhYANCg0ACgkHAA
oJBwANDQ0ADYyLQA1JgAANyEAAC4uLgAaPR0APCoAAAgICAAAJD8AAAA+ACoqKgAvLy8AMzMzADc
3NwA7OzsAPz8/AAAAAAAAAAQAAAAIAAAADAAAABEAAAAVAAAAGQAAAB0AAAAiAAAAJgAAACoAAAA
vAAAAMwAAADcAAAA7AAAAPwAAAAAAAAQAAAAIAAAADAAAABEAAAAVAAAAGQAAAB0AAAAiAAAAJgA
AACoAAAAvAAAAMwAAADcAAAA7AAAAPwAAAAAAAAQAAAAIAAAADAAAABEAAAAVAAAAGQAAAB0AAAA
iAAAAJgAAACoAAAAvAAAAMwAAADcAAAA7AAAAPwAAAD8AAAA/BAAAPwgAAD8MAAA/EQAAPxUAAD8
ZAAA/HQAAPyIAAD8mAAA/KgAAPy8AAD8zAAA/NwAAPzsAAD8/AAA/PwAAOz8AADc/AAAzPwAALz8
AACo/AAAmPwAAIj8AAB0/AAAZPwAAFT8AABE/AAAMPwAACD8AAAQ/AAAAPwAAAD8AAAA/BAAAPwg
AAD8MAAA/EQAAPxUAAD8ZAAA/HQAAPyIAAD8mAAA/KgAAPy8AAD8zAAA/NwAAPzsAAD8/AAA/PwA
AOz8AADc/AAAzPwAALz8AACo/AAAmPwAAIj8AAB0/AAAZPwAAFT8AABE/AAAMPwAACD8AAAQ/AAA
APwAAAD8ABAA/AAgAPwAMAD8AEQA/ABUAPwAZAD8AHQA/ACIAPwAmAD8AKgA/AC8APwAzAD8ANwA
/ADsAPwA/AD8APwA/AD8AOwA/ADcAPwAzAD8ALwA/ACoAPwAmAD8AIgA/AB0APwAZAD8AFQA/ABE
APwAMAD8ACAA/AAQAPwAAAAAAAAA 

...

BQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUF
BQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUF
BQUFBQUFBQUFBQUFBQUFBQkFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUF
BQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUJ

And I get a certain binary string from base64:

b'BM60\x01\x00\x00\x00\x00\x006\x04\x00\x00(\x00\x00\x00@\x01\x00\x00\xf0\x00\x00\x00\x01\x00\x08\x00\x00\x00\x00\x00\x00\x00\x00\x00B\x0b\x00\x00B\x0b\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xa0\x00\x00\xf0\x00\x00\xf0\x00\x00\x00\x00\xfc\xfc\x00\xfc\xfc\x00\x00\xfcp\xfc\x00\xfc\xfc\xfc\x00\xa0\xa0\xa0\x00@@@\x00P0\x00\x00XXX\x00\xd0\xa0\xd0\x00\xa0\x90p\x00\xa0\x90p\x00\xd0\xd0\xd0\x00\xd8\xc8\xb4\x00\xd4\x98\x00\x00\xdc\x84\x00\x00\xb8\xb8\xb8\x00h\xf4t\x00\xf0\xa8\x00\x00   \x00\x00\x90\xfc\x00\x00\x00\xf8\x00\xa8\xa8\xa8\x00\xbc\xbc\xbc\x00\xcc\xcc\xcc\x00\xdc\xdc\xdc\x00\xec\xec\xec\x00
...
\t\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\t'

Then using the Pillow library I display this image obtained from the TCP stream segments and get the following image:

from PIL import Image
import io

Image.open(io.BytesIO(decoded_data))

distorted bmp image

enter image description here

As far as I understand, somewhere somehow incorrectly taken shift relative to bmp color matrix, but I do not understand where I make a mistake, can you please suggest

The image is displayed, but with a staggered pitch that I don't know how to normalize yet.


Solution

  • Using this script, I get an uncorrupted image shown:

    import io
    from pathlib import Path
    
    import yaml
    from PIL import Image
    
    
    packets = yaml.safe_load(Path("5bksih1B.txt").read_text())
    raw = b"".join([p["data"] for p in packets])
    im = Image.open(io.BytesIO(raw))
    im.show()
    

    There are two 320x240 images in the full data you pasted, here are the results:

    im1

    im2

    Note: that "marker" Qk02MA you noticed is actually the start of a bitmap header:

    >>> import base64
    >>> base64.b64decode("Qk02MA==")
    b'BM60'
    

    To continuously read frames from a stream, first parse this header, and then consume the correct amount of bytes from a buffer.