Currently I am struggling to find the proper answer to this, so it would be great if someone could help me solve this. I have a deeper XML which I want to convert into a table. the XML looks like this:
<Motherofall>
<Parent>
<Child>
<val1>XX1</val1>
<Child2>
<val2>YY1</val2>
<val2>YY2</val2>
<Child2>
<val2>YY3</val2>
<val2>YY4</val2>
</parent>
+<parent>
+<parent>
</Motherofall>
So eventually what I want to have as output would be a table with column val1 and a colmun val2. So val1 is repeated twice per parent.
import xml.etree.ElementTree as et
tree = et.parse(last_file)
for node in tree.findall('.//Parent'):
XX = node.find('.//Child')
print(XX.text)
for node2 in tree.findall('.//Child2'):
YY = node2.find('.//val1')
print(YY.text)
As one might notice I am fairly new to this, however I could not find a fitting answer.
I started from bringing some order to your input file (e.g. added missing closing tags), so that it contains:
<Motherofall>
<parent>
<Child>
<val1>XX1</val1>
</Child>
<Child2>
<val2>YY1</val2>
<val2>YY2</val2>
</Child2>
<Child2>
<val2>YY3</val2>
<val2>YY4</val2>
</Child2>
</parent>
<parent>
<Child>
<val1>XX2</val1>
</Child>
<Child2>
<val2>YY1</val2>
<val2>YY2</val2>
</Child2>
<Child2>
<val2>YY3</val2>
</Child2>
</parent>
</Motherofall>
The initial part of code is to read the XML:
import xml.etree.ElementTree as et
tree = et.parse('Input.xml')
root = tree.getroot()
Then to read data from it and create a Pandas DataFrame, you can run:
rows = []
for par in root.iter('parent'):
xx = par.findtext('Child/val1')
for vv in par.findall('Child2/val2'):
tt = vv.text
rows.append([xx, tt])
df = pd.DataFrame(rows, columns=['x', 'y'])
The result is:
x y
0 XX1 YY1
1 XX1 YY2
2 XX1 YY3
3 XX1 YY4
4 XX2 YY1
5 XX2 YY2
6 XX2 YY3