I'm trying to 'export' from an xml playlist to an html table for sharing. But the iTunes library file uses pairs of key values instead of more meaningful XML tags. Is there a simple way to also get the <value>
in these key/value pairs?
This gets me as far as the value of the <key>
, i.e. Track ID Name Artist Album Artist etc., but I can't seem to find a way to also get the value of the next key, i.e. <integer>
49924, or <string>
Ep. 35 | What Do Your... Can (should) I do this with ElementTree or should I move along to Regular Expressions or some other library? Thanks!
data = '''<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Major Version</key><integer>1</integer>
<key>Minor Version</key><integer>1</integer>
<key>Date</key><date>2019-01-21T07:31:15Z</date>
<key>Application Version</key><string>12.8.0.150</string>
<key>Features</key><integer>5</integer>
<key>Show Content Ratings</key><true/>
<key>Music Folder</key><string>file:///Users/Music/iTunes/iTunes%20Media/</string>
<key>Library Persistent ID</key><string>75E62CF156F5AE1B</string>
<key>Tracks</key>
<dict>
<key>49924</key>
<dict>
<key>Track ID</key><integer>49924</integer>
<key>Name</key><string>Ep. 35 | What Do Your Morals Taste Like? | Guest: Jonathan Haidt</string>
<key>Artist</key><string>Blaze Podcast Network</string>
<key>Album Artist</key><string>Blaze Podcast Network</string>
<key>Album</key><string>Something's Off with Andrew Heaton</string>
<key>Genre</key><string>News & Politics</string>
<key>Kind</key><string>MPEG audio file</string>
<key>Size</key><integer>48123940</integer>
<key>Total Time</key><integer>3004133</integer>
<key>Year</key><integer>2019</integer>
<key>Date Modified</key><date>2019-01-13T01:10:30Z</date>
<key>Date Added</key><date>2019-01-13T01:10:30Z</date>
<key>Bit Rate</key><integer>128</integer>
<key>Sample Rate</key><integer>44100</integer>
<key>Release Date</key><date>2019-01-11T12:00:00Z</date>
<key>Artwork Count</key><integer>1</integer>
<key>Persistent ID</key><string>5FAE7186A09E5D3E</string>
<key>Disabled</key><true/>
<key>Track Type</key><string>File</string>
<key>Purchased</key><true/>
<key>Podcast</key><true/>
<key>Unplayed</key><true/>
<key>Location</key><string>file:///Users/Music/iTunes/iTunes%20Media/Podcasts/Something's%20Off%20with%20Andrew%20Heaton/Ep.%2035%20_%20What%20Do%20Your%20Morals%20Taste%20Like_%20_%20Guest_%20Jonathan%20Haidt.mp3</string>
<key>File Folder Count</key><integer>4</integer>
<key>Library Folder Count</key><integer>1</integer>
</dict>
</dict>
</dict>
</plist>'''
from xml.etree import ElementTree as ET
xml = ET.fromstring(data)
lst = xml.findall('dict/dict/dict/key')
for item in lst:
print(item.text)
Question: How to access value element in iTunes xml
The following solution using lxml.etree.iterparse
appends a <key>
tag with the following <value>
tag to build a Python dict {key:value}
.
Used module and built-in functions:
from lxml import etree
import io
class Playlist:
def __init__(self, fh):
"""
Initialize 'iterparse' to generate 'start' and 'end' events on all tags
:param fh: File Handle from the XML File to parse
"""
self.context = etree.iterparse(fh, events=("start", "end",))
def _parse(self):
"""
Yield only at 'end' event, except 'start' from tag 'dict'
:return: yield current Element
"""
for event, elem in self.context:
if elem.tag == 'plist' or \
(event == 'start' and not elem.tag == 'dict'):
continue
yield elem
def _parse_key_value(self, key=None):
_dict = {}
for elem in self._parse():
if elem.tag == 'key':
key = elem.text
continue
if elem.tag in ['integer', 'string', 'date']:
if not key is None:
_dict[key] = elem.text
key = None
else:
print('Missing key for value {}'.format(elem.text))
elif elem.tag in ['true', 'false']:
_dict[key] = elem.tag == 'true'
elif elem.tag == 'dict':
if not key is None:
_dict[key] = self._parse_dict(key)
key = None
else:
return elem, _dict
else:
print('Unknow tag {}'.format(elem.tag))
def _parse_dict(self, key=None):
elem = next(self._parse())
elem, _dict = self._parse_key_value(elem.text)
return _dict
def __iter__(self):
for elem in self._parse():
if elem.tag == 'dict':
yield self._parse_dict()
else:
print('Unknow tag {}'.format(elem.tag))
if __name__ == "__main__":
data = b'''<?xml...'''
with io.BytesIO(data) as in_xml:
for record in Playlist(in_xml):
print("record:{}".format(record))
for key, value in record.items():
print("{}:{}".format(key, value))
Output:
record:{'Major Version': '1', 'Minor Version': '1'... (omitted for brevity) Major Version:1 Minor Version:1 Date:2019-01-24T10:31:15Z Tracks:{'99244': {'Track ID': '99244', 'Artist': 'Blaze Podcast Network', ... (omitted for brevity)}}
Tested with Python: 3.5