I've never worked with XML files. I have this XML file which is a really long file containing various information about different type of financial instrument.
<RiskFullInfo><dossier>1234</dossier><stockAmounts><stockAmount><dossier>1234</dossier><sottorubrica>0</sottorubrica><idInstrument>173</idInstrument><shortname>ABC.MI</shortname><name>abc</name><data>dd/mm/yyyy 00:00:00</data><quantita>number</quantita><pmc>somenumber</pmc><pmcEuro>eurovalyes</pmcEuro><marketprice>europrice</marketprice>
this is the structure repeated for n instruments.
What I would like to do is to put in a dataframe the text/values of fields idInstrument, shortname,quantita for ALL the n instruments in this file.
Once used the code to get the root with ET, I tried to extract the information I need with node.attrib and node.find but I get an empty dataframe.
import xml.etree.ElementTree as ET
tree = ET.parse(r'path')
root = tree.getroot()
df_cols = ["name", "Position"]
rows = []
for node in root:
s_name = node.attrib.get("shortname")
s_q = node.find("quantita")
Can someone help on this?
IIUC, you can just use read_xml
:
use_cols = ["idInstrument", "shortname", "quantita"]
df = pd.read_xml("file.xml", xpath=".//stockAmount")[use_cols]
Output :
print(df)
idInstrument shortname quantita
0 173 ABC.MI number
Otherwise, use this :
tree = ET.parse("file.xml")
root = tree.getroot().findall(".//stockAmount")
use_cols = ["idInstrument", "shortname", "quantita"]
df = pd.DataFrame({key: [e.find(key).text for e in root] for key in use_cols})