Search code examples
pythonjsonexiftool

How to output JSON with PyExifTool


I am running the following code to output a list of image and video files on a shared drive along with the creation date. However, I just cannot seem to get the execute_json method working properly. The documentation for the project is not extensive.

def main():
    dir_name = '/Volumes/photo/phone backup/'
    tags = ["File Name", "CreateDate"]
    log_file = '/Volumes/photo/py_log.txt'

    file_names = getListOfFiles(dir_name)
    with exiftool.ExifTool() as e:
        metadata = e.execute_json(str(e.get_tags_batch(tags, file_names)))
    f = open(log_file, "w")
    f.write(metadata)
    f.close()

The error I receive is AttributeError: 'list' object has no attribute 'encode'. If I alter the code a bit to this,

    file_names = getListOfFiles(dir_name)
    with exiftool.ExifTool() as e:
        metadata = e.get_tags_batch(tags, file_names)
    f = open(log_file, "w")
    f.write(str(metadata))
    f.close()

The resulting file has this format,

[{'SourceFile': '/Volumes/photo/phone backup/2017-05/MOV_0112.mp4', 'QuickTime:CreateDate': '2017:04:30 13:56:18'}, {'SourceFile': '/Volumes/photo/phone backup/2017-05/MOV_0174.mp4', 'QuickTime:CreateDate': '2017:06:09 06:12:47'}, {'SourceFile': '/Volumes/photo/phone backup/2017-05/MOV_0141.mp4', 'QuickTime:CreateDate': '2017:05:14 16:36:10'}

When I try and serialise this file, I get the error that this is not valid JSON.

Would appreciate some assistance as I am utterly lost.

I could replace the ' with " myself, but that seems an extremely ham-fisted solution.

EDIT After playing around with it a bit more - I do a json.dump call on the output metadata. After that calling it with json.load is not a problem. But doing the following,

for key in file_data:
    print (key)

results in this output,

{'SourceFile': '/Volumes/photo/phone backup/2017-05/MOV_0112.mp4', 'QuickTime:CreateDate': '2017:04:30 13:56:18'}
{'SourceFile': '/Volumes/photo/phone backup/2017-05/MOV_0174.mp4', 'QuickTime:CreateDate': '2017:06:09 06:12:47'}

So this is not a properly formatted JSON output at all that ExifTool is spitting. It's just basically a large text dump.


Solution

  • I think you may be misunderstanding the documentation. If we look at the docs for the execute_json method, it says:

    Execute the given batch of parameters and parse the JSON output.

    This method is similar to :py:meth:execute(). It automatically adds the parameter -j to request JSON output from exiftool and parses the output. The return value is a list of dictionaries, mapping tag names to the corresponding values.

    This clearly states that the exiftool module parses the output, which is to say it reads the JSON data and returns a list of Python data structures.

    Similarly, the documentation for get_tags_batch says:

    The format of the return value is the same as for :py:meth:execute_json().

    So when you do this...

        file_names = getListOfFiles(dir_name)
        with exiftool.ExifTool() as e:
            metadata = e.get_tags_batch(tags, file_names)
        f = open(log_file, "w")
        f.write(str(metadata))
        f.close()
    

    ...you're just writing the string representation of that data to a file (i.e., the output of str(metadata)). If you wanted to actually write JSON data, you would need to import json and then:

    with open(log_file, 'w') as fd:
      json.dump(metadata, log_file)
    

    To read in this file, we would use json.load:

    with open(log_file, 'r') as fd:
      metadata = json.load(fd)
    

    This gets us the original list of dictionaries. For example, we could iterate over it like this:

    for imageinfo in metadata:
      print('CreateDate:', imageinfo.get('EXIF:CreateDate', 'unknown'))
    

    Here's the complete sample code I used to test this:

    import exiftool
    import json
    import os
    
    
    file_names = [f'images/{fn}' for fn in os.listdir('images')]
    tags = ["File Name", "CreateDate"]
    with exiftool.ExifTool() as e:
        metadata = e.get_tags_batch(tags, file_names)
    
    with open('md.json', 'w') as fd:
        json.dump(metadata, fd)
    
    
    with open('md.json', 'r') as fd:
        metadata = json.load(fd)
    
    
    for imageinfo in metadata:
        print('Created: {}'.format(
            imageinfo.get('EXIF:CreateDate', 'unknown')
        ))
    

    Running this against my local images directory, which contains 7 random images, results in:

    Created: 2020:01:26 15:06:12
    Created: 2020:04:16 18:13:48
    Created: unknown
    Created: unknown
    Created: 2020:01:26 15:05:54
    Created: 2020:04:16 18:07:41
    Created: 2020:01:26 15:07:58