Search code examples
pythonpython-3.xfilemime-typesfile-handling

How to know MIME-type of a file from base64 encoded data in python?


I have a base64 encoded string for a file.

encoded_data = '/9j/4AAQSkZJRgABAQEASABIAAD//gA7Q1JFQVRPUjogZ2QtanBlZyB2MS4wICh1c2luZyBJSkcgSlBFRyB2NjIpLCBxdWFsaXR5ID0gOTUK/9sAQwAGBAUGBQQGBgUGBwcGCAoQCgoJCQoUDg8MEBcUGBgXFB...'

How can I know the MIME-type of the file from that string?


Solution

  • In the general case, there is no way to reliably identify the MIME type of a piece of untagged data.

    Many file formats have magic markers which can be used to determine the type of the file with reasonable accuracy, but some magic markers are poorly chosen and might e.g. coincide with text in unrelated files; and of course, a completely random sequence of bits is not in any well-defined file format.

    libmagic is the central component of the file command which is commonly used to perform this task. There are several Python bindings but https://pypi.org/project/python-libmagic/ seems to be the most popular and active.

    Of course, base64 is just a way to encode untyped binary data. Here's a quick demo with your sample data.

    import base64
    
    import magic
    
    encoded_data = '/9j/4AAQSkZJRgABAQEASABIAAD//gA7Q1JFQVRPUjogZ2QtanBlZyB2MS4wICh1c2luZyBJSkcgSlBFRyB2NjIpLCBxdWFsaXR5ID0gOTUK/9sAQwAGBAUGBQQGBgUGBwcGCAoQCgoJCQoUDg8MEBcUGBgXFB==='
    with magic.Magic() as m:
        print(m.from_buffer(base64.b64decode(encoded_data)))
    

    Output:

    image/jpeg
    

    (Notice I had to fix the padding at the end of your encoded_data.)