I'm trying to read XMP data from a JPG in python using the Python XMP Toolkit. However, I've run into several images where the library fails to load any XMP data:
>>> from libxmp.utils import file_to_dict
>>> file_to_dict("/path/to/file.jpg")
Unrecognized TIFF prefix
{}
I get a similar error if I try to extract the image metadata using Pillow:
>>> from PIL import Image
>>> Image.open(file_path)._getexif()
File "<string>", line unknown
SyntaxError: not a TIFF IFD
These images display correctly in the browser, running PIL's verify()
method on the file doesn't raise any exceptions, and if I open the image as text I can see the image metadata in a format that looks correct. Finally, the (apparently less finicky) exif_read_data
function in PHP can read all the metadata for these images without issue.
Is there a way to either (1) fix the image so it no longer has the bad 'TIFF prefix' or (2) tell either Pillow or libxmp to be less strict when trying to read XMP metadata?
This doesn't seem completely ideal, but I've found a solution that may be 'good enough' for me. Here is some code inspired by the answers in this question.
import libxmp
def parse_xmp(path):
data = libxmp.utils.file_to_dict(path)
if not data:
data = dirty_parse_xmp(path)
return data
def dirty_parse_xmp(path):
# Find the XMP data in the file
xmp_data = ''
xmp_started = False
with open(path) as infile:
for line in infile:
if not xmp_started:
xmp_started = '<x:xmpmeta' in line
if xmp_started:
xmp_data += line
if line.find('</x:xmpmeta') > -1:
break
else: # if XMP data is not found
return {}
xmp_open_tag = xmp_data.find('<x:xmpmeta')
xmp_close_tag = xmp_data.find('</x:xmpmeta>')
xmp_str = xmp_data[xmp_open_tag:xmp_close_tag + 12]
# Pass just the XMP data to libxmp as a string
meta = libxmp.XMPMeta()
meta.parse_from_str(xmp_str)
return libxmp.utils.object_to_dict(meta)