I'm looking for some insights on how to properly display this xml:
<?xml version="1.0" encoding="UTF-8"?>
<HEADER>
<PRODUCT>
<SUPPLIER>015</SUPPLIER>
<PRODUCT_DETAILS>
<KEYWORD>Paper</KEYWORD>
<PRODUCT_TYPE>major</PRODUCT_TYPE>
</PRODUCT_DETAILS>
<PRODUCT_FEATURES>
<REFERENCE>Class01</REFERENCE>
<FEATURE>
<FNAME>Colour</FNAME>
<FVALUE>white</FVALUE>
</FEATURE>
</PRODUCT_FEATURES>
</PRODUCT>
</HEADER>
For a simpler structure, that looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<HEADER>
<PRODUCT_DETAILS>
<KEYWORD>Paper</KEYWORD>
<PRODUCT_TYPE>major</PRODUCT_TYPE>
</PRODUCT_DETAILS>
<PRODUCT_FEATURES>
<FEATURE>
<FNAME>Colour</FNAME>
<FVALUE>white</FVALUE>
</FEATURE>
</PRODUCT_FEATURES>
</HEADER>
I've written a few lines that look like this:
import xml.etree.ElementTree as ET
import pandas as pd
tree = ET.parse('file.xml')
root = tree.getroot()
df = pd.DataFrame()
for i in range(0, len(root), 5):
details = [(child.tag, child.text) for child in root[i + 0]]
features = [(child[0].text, child[1].text) for child in root[i + 1]]
temp_df = pd.DataFrame([[i[1] for i in details + features]], columns=[i[0] for i in details + features])
df = pd.concat([df, temp_df])
df
# df.to_csv("file_export.csv", index=False)
... and produce this output:
KEYWORD PRODUCT_TYPE Colour
0 Paper major white
What edits do I need to make, to output this:
SUPPLIER KEYWORD PRODUCT_TYPE REFERENCE Colour
0 015 Paper major Class01 white
Thank you for your help!
Best, ~C
Here is one way to do it:
import xml.etree.ElementTree as ET
import pandas as pd
tree = ET.parse("file.xml")
root = tree.getroot()
data = {
"SUPPLIER": [],
"KEYWORD": [],
"PRODUCT_TYPE": [],
"REFERENCE": [],
"Colour": [],
}
for product in root:
data["SUPPLIER"].append(product[0].text)
data["KEYWORD"].append(product[1][0].text)
data["PRODUCT_TYPE"].append(product[1][1].text)
data["REFERENCE"].append(product[2][0].text)
data["Colour"].append(product[2][1][1].text)
df = pd.DataFrame(data)
print(df)
# Output
SUPPLIER KEYWORD PRODUCT_TYPE REFERENCE Colour
0 015 Paper major Class01 white