I am trying to parse an xml that looks like this. I want to extract information regarding the katagorie i.e ID, parent ID etc:
<?xml version="1.0" encoding="UTF-8"?>
<test timestamp="20210113">
<kategorien>
<kategorie id="1" parent_id="0">
Sprache
</kategorie>
</kategorien>
</test>
I am trying this
fields = ['id', 'parent_id']
with open('output.csv', 'wb') as fp:
writer = csv.writer(fp)
writer.writerow(fields)
tree = ET.parse('./file.xml')
# from your example Locations is the root and Location is the first level
for elem in tree.getroot():
writer.writerow([(elem.get(name) or '').encode('utf-8')
for name in fields])
but I get this error:
in <module>
writer.writerow(fields)
TypeError: a bytes-like object is required, not 'str'
even though I am already using encode('utf-8')
in my code. How can I get rid of this error?
EDIT 2 If want to find regarding nested attributes or sub-classes, there are two ways:
for elem in root:
for child in elem:
print([(child.attrib.get(name) or 'c') for name in fields])
Output:
['1', '0']
Here, it can also return for classes which have id
and parent_id
but not the name kategorie
.
for elem in root.iter('kategorie'):
print([(elem.attrib.get(name) or 'c') for name in fields])
Output:
['1', '0']
For this method, it will return for every class and sub-class named kategorie
.
EDIT 1: For the issue in comments:
<?xml version="1.0"?>
<kategorien>
<kategorie id="1" parent_id="0">
Sprache
</kategorie>
</kategorien>
For the above xml
file, the code seems to work perfectly:
fields = ['id', 'parent_id']
for elem in tree.getroot():
print([(elem.attrib.get(name) or 'c') for name in fields])
Output:
['1', '0']
Original Answer: Looks like you are looking at the wrong location for the error. The error is actually occurring at
writer.writerow(fields)
fields
is a list containing str
and not byte
, that is why it is giving you the error. I would have recommended you to change the write type from wb
to w
, but looking at the rest of the code, it looks like you want to write in byte
.
writer.writerow([x.encode('utf-8') for x in fields])
encode()
just converts your data to byte
form.