Search code examples
pythonpython-3.xpandastype-conversionexport-to-csv

How to create a csv from a document by Python


I have a not standard XML file, that needs to convert to CSV, I've tried the xml.etree.ElementTree, but it doesn't work.

colunms can be the code, displayname, codedescription, isdisplayed,displayorder and codesetname. the values are after equal sign.

is there a way to convert this file to CSV by using Python?

I've tried this way

import xml.etree.ElementTree as Xet
import pandas as pd

cols = ["code", "displayName", "codeDescription", "isDisplayed", "displayOrder","CodeSetName"]
rows = []

# Parsing the XML file
xmlparse1 = Xet.parse("./code.xml")
root = xmlparse1.getroot()


for i in root:
    
    code = i.find("Code").text
    displayName = i.find("displayName").text
    codeDescription = i.find("codeDescription").text
    isDisplayed=i.find("isDisplayed").text
    displayOrder=i.find("displayOrder").text
    CodeSetName=i.find("CodeSetName").text



rows.append({"code": code,"displayName": displayName,"codeDescription": codeDescription,"isDisplayed": isDisplayed,"displayOrder": displayOrder,"CodeSetName":CodeSetName})

df = pd.DataFrame(rows, columns=cols)

# Writing dataframe to csv
df.to_csv('code.csv')

But it doesn't work. The srcipt shows can't get anything from .text The format in the document is shown below.

<Codes>
    <Code code="value" displayName="value" codeDescription="value" isDisplayed="value" displayOrder="13423" CodeSetName="1234" />
    <Code code="value" displayName="value" codeDescription="value" isDisplayed="value" displayOrder="value" CodeSetName="value" />
</Codes>

Solution

  • import xml.etree.ElementTree as Xet
    import pandas as pd
    
    cols = ["code", "displayName", "codeDescription",
            "isDisplayed", "displayOrder", "CodeSetName"]
    rows = []
    
    # Parsing the XML file
    xmlparse1 = Xet.parse("code.xml")
    root = xmlparse1.getroot()
    
    # Pandas
    rows = []
    for item in root:
        rows.append(dict(item.items()))
    
    df = pd.DataFrame(rows,columns=cols)
    

    Output:

    code    displayName     codeDescription     isDisplayed     displayOrder    CodeSetName
    value   value           value               value           13423           1234
    value   value           value               value           value           value