Search code examples
pythonoptimizationdata-structuresdictionarymutagen

More Efficient Way of Reading Large List of dicts


For my project I use the mutagen library to read ID3 tags from 5000+ mp3 files. After reading them I construct the following objects using them.

    class Track:
    def __init__(self, artist, title, album=None):
        self.artist = artist
        self.title = title
        self.album = None

    def __str__(self):
        return "Track: %s : %s" % (self.artist,self.title, )

    def set_album(self,album):
        self.album = album

class Album:
    def __init__(self, artist, title, year='', genre='', tracks=None):
        self.artist = artist
        self.year = year
        self.genre = genre
        self.title = title 
        self.tracks = []

    def __str__(self):
        return "Album: %s : %s [%d]" % (self.artist,self.title,len(self.tracks))

    def add_track(self,track):
        self.tracks.append(track)

The problem is some files are missing some required tags(title missing,artist missing, or both), causing KeyValueError

 #'TALB' (album title), 'TIT2' (track title), 'TPE1' (artist), 'TDRC' (year), and 'TCON' (genre)
    for root, dirs, files in os.walk(dir):
        for filename in files:
            if filename.lower().endswith(e):
                fullname = os.path.join(root, filename)
                try:
                    audio = mutagen.File(fullname)
                    track = Track(audio['TPE1'],audio['TIT2'])
                    album = Album(audio['TPE1'], audio['TALB'], audio['TDRC'], audio['TCON'])
                excpet Exception as e:
                                print "Error on %s. %s " % (filename,type(e).__name__)  

This loads all the files that have all the tags, which is not good enough. I solved this problem by using ifs, it works fine and is fast enough. However I wonder if there is a better way of handling this.


Solution

  • If your default value can be an empty string instead of None you could use a defaultdict.

    >>> 
    >>> from collections import defaultdict
    >>> d = defaultdict(str)
    >>> d['a'] = 'data'
    >>> d['b'] = 1
    >>> d
    defaultdict(<type 'str'>, {'a': 'data', 'b': 1})
    >>> a = d['a']
    >>> b = d['b']
    >>> c = d['c']
    >>> a, b, c
    ('data', 1, '')
    >>> d
    defaultdict(<type 'str'>, {'a': 'data', 'c': '', 'b': 1})
    >>>