Search code examples
pythonxmlituneselementtree

How to access value element in iTunes xml using ElementTree?


I'm trying to 'export' from an xml playlist to an html table for sharing. But the iTunes library file uses pairs of key values instead of more meaningful XML tags. Is there a simple way to also get the <value> in these key/value pairs?

This gets me as far as the value of the <key>, i.e. Track ID Name Artist Album Artist etc., but I can't seem to find a way to also get the value of the next key, i.e. <integer> 49924, or <string> Ep. 35 | What Do Your... Can (should) I do this with ElementTree or should I move along to Regular Expressions or some other library? Thanks!

data = '''<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Major Version</key><integer>1</integer>
    <key>Minor Version</key><integer>1</integer>
    <key>Date</key><date>2019-01-21T07:31:15Z</date>
    <key>Application Version</key><string>12.8.0.150</string>
    <key>Features</key><integer>5</integer>
    <key>Show Content Ratings</key><true/>
    <key>Music Folder</key><string>file:///Users/Music/iTunes/iTunes%20Media/</string>
    <key>Library Persistent ID</key><string>75E62CF156F5AE1B</string>
    <key>Tracks</key>
    <dict>
        <key>49924</key>
        <dict>
            <key>Track ID</key><integer>49924</integer>
            <key>Name</key><string>Ep. 35 | What Do Your Morals Taste Like? | Guest: Jonathan Haidt</string>
            <key>Artist</key><string>Blaze Podcast Network</string>
            <key>Album Artist</key><string>Blaze Podcast Network</string>
            <key>Album</key><string>Something's Off with Andrew Heaton</string>
            <key>Genre</key><string>News &#38; Politics</string>
            <key>Kind</key><string>MPEG audio file</string>
            <key>Size</key><integer>48123940</integer>
            <key>Total Time</key><integer>3004133</integer>
            <key>Year</key><integer>2019</integer>
            <key>Date Modified</key><date>2019-01-13T01:10:30Z</date>
            <key>Date Added</key><date>2019-01-13T01:10:30Z</date>
            <key>Bit Rate</key><integer>128</integer>
            <key>Sample Rate</key><integer>44100</integer>
            <key>Release Date</key><date>2019-01-11T12:00:00Z</date>
            <key>Artwork Count</key><integer>1</integer>
            <key>Persistent ID</key><string>5FAE7186A09E5D3E</string>
            <key>Disabled</key><true/>
            <key>Track Type</key><string>File</string>
            <key>Purchased</key><true/>
            <key>Podcast</key><true/>
            <key>Unplayed</key><true/>
            <key>Location</key><string>file:///Users/Music/iTunes/iTunes%20Media/Podcasts/Something's%20Off%20with%20Andrew%20Heaton/Ep.%2035%20_%20What%20Do%20Your%20Morals%20Taste%20Like_%20_%20Guest_%20Jonathan%20Haidt.mp3</string>
            <key>File Folder Count</key><integer>4</integer>
            <key>Library Folder Count</key><integer>1</integer>
        </dict>
    </dict>
</dict>
</plist>'''
from xml.etree import ElementTree as ET
xml = ET.fromstring(data)
lst = xml.findall('dict/dict/dict/key')
for item in lst:
    print(item.text)

Solution

  • Question: How to access value element in iTunes xml

    The following solution using lxml.etree.iterparse appends a <key> tag with the following <value> tag to build a Python dict {key:value}.

    Used module and built-in functions:


    from lxml import etree
    import io
    
    class Playlist:
        def __init__(self, fh):
            """
            Initialize 'iterparse' to generate 'start' and 'end' events on all tags
    
            :param fh: File Handle from the XML File to parse
            """
            self.context = etree.iterparse(fh, events=("start", "end",))
    
        def _parse(self):
            """
            Yield only at 'end' event, except 'start' from tag 'dict'
            :return: yield current Element
            """
            for event, elem in self.context:
                if elem.tag == 'plist' or \
                        (event == 'start' and not elem.tag == 'dict'):
                    continue
                yield elem
    
        def _parse_key_value(self, key=None):
            _dict = {}
            for elem in self._parse():
                if elem.tag == 'key':
                    key = elem.text
                    continue
    
                if elem.tag in ['integer', 'string', 'date']:
                    if not key is None:
                        _dict[key] = elem.text
                        key = None
                    else:
                        print('Missing key for value {}'.format(elem.text))
    
                elif elem.tag in ['true', 'false']:
                    _dict[key] = elem.tag == 'true'
    
                elif elem.tag == 'dict':
                    if not key is None:
                        _dict[key] = self._parse_dict(key)
                        key = None
                    else:
                        return elem, _dict
                else:
                    print('Unknow tag {}'.format(elem.tag))
    
        def _parse_dict(self, key=None):
            elem = next(self._parse())
            elem, _dict = self._parse_key_value(elem.text)
            return _dict
    
        def __iter__(self):
            for elem in self._parse():
                if elem.tag == 'dict':
                    yield self._parse_dict()
                else:
                    print('Unknow tag {}'.format(elem.tag))
    
    if __name__ == "__main__":
    
        data = b'''<?xml...'''
    
        with io.BytesIO(data) as in_xml:
            for record in Playlist(in_xml):
                print("record:{}".format(record))
    
                for key, value in record.items():
                    print("{}:{}".format(key, value))
    

    Output:

    record:{'Major Version': '1', 'Minor Version': '1'... (omitted for brevity)
        Major Version:1
        Minor Version:1
        Date:2019-01-24T10:31:15Z
        Tracks:{'99244': {'Track ID': '99244', 'Artist': 'Blaze Podcast Network', ... (omitted for brevity)}}
    

    Tested with Python: 3.5