Search code examples
pythonimagepython-imaging-libraryexifxmp

Python XMP Toolkit is too strict ("Unrecognized TIFF prefix") when trying to read image metadata


I'm trying to read XMP data from a JPG in python using the Python XMP Toolkit. However, I've run into several images where the library fails to load any XMP data:

>>> from libxmp.utils import file_to_dict
>>> file_to_dict("/path/to/file.jpg")
Unrecognized TIFF prefix
{}

I get a similar error if I try to extract the image metadata using Pillow:

>>> from PIL import Image
>>> Image.open(file_path)._getexif()
  File "<string>", line unknown
SyntaxError: not a TIFF IFD

These images display correctly in the browser, running PIL's verify() method on the file doesn't raise any exceptions, and if I open the image as text I can see the image metadata in a format that looks correct. Finally, the (apparently less finicky) exif_read_data function in PHP can read all the metadata for these images without issue.

Is there a way to either (1) fix the image so it no longer has the bad 'TIFF prefix' or (2) tell either Pillow or libxmp to be less strict when trying to read XMP metadata?


Solution

  • This doesn't seem completely ideal, but I've found a solution that may be 'good enough' for me. Here is some code inspired by the answers in this question.

    import libxmp
    
    def parse_xmp(path):
        data = libxmp.utils.file_to_dict(path)
        if not data:
            data = dirty_parse_xmp(path)
        return data
    
    
    def dirty_parse_xmp(path):
    
        # Find the XMP data in the file
        xmp_data = ''
        xmp_started = False
        with open(path) as infile:
            for line in infile:
                if not xmp_started:
                    xmp_started = '<x:xmpmeta' in line
                if xmp_started:
                    xmp_data += line
                    if line.find('</x:xmpmeta') > -1:
                        break
            else:  # if XMP data is not found
                return {}
        xmp_open_tag = xmp_data.find('<x:xmpmeta')
        xmp_close_tag = xmp_data.find('</x:xmpmeta>')
        xmp_str = xmp_data[xmp_open_tag:xmp_close_tag + 12]
    
        # Pass just the XMP data to libxmp as a string
        meta = libxmp.XMPMeta()
        meta.parse_from_str(xmp_str)
        return libxmp.utils.object_to_dict(meta)