Search code examples
pythonxmlcsv

Xml file returning NonType traceback errror while parsing


I have been trying to parse an xml file using ElementTree.parse and I was getting NullType TraceBack Error. I know a lot of question has been asked about this same issue, but I noticed something about the xml file I was trying to parse. I noticed the xml file is different from the one I saw in the questions asked on these platform. Not that I have an idea of what a standard xml file should looked like but I am suspecting perhaps my file is the problem.

The file is gotten from archive.org and it is a file containing data from stackexchange. The file is included here

<?xml version="1.0" encoding="utf-8"?>
<users>
  <row Id="-1" Reputation="1" CreationDate="2015-03-16T22:02:56.000" DisplayName="Community" LastAccessDate="2015-03-16T22:02:56.000" Location="on the server farm" AboutMe="&lt;p&gt;Hi, I'm not really a person.&lt;/p&gt;&#xD;&#xA;&lt;p&gt;I'm a background process that helps keep this site clean!&lt;/p&gt;&#xD;&#xA;&lt;p&gt;I do things like&lt;/p&gt;&#xD;&#xA;&lt;ul&gt;&#xD;&#xA;&lt;li&gt;Randomly poke old unanswered questions every hour so they get some attention&lt;/li&gt;&#xD;&#xA;&lt;li&gt;Own community questions and answers so nobody gets unnecessary reputation from them&lt;/li&gt;&#xD;&#xA;&lt;li&gt;Own downvotes on spam/evil posts that get permanently deleted&lt;/li&gt;&#xD;&#xA;&lt;li&gt;Own suggested edits from anonymous users&lt;/li&gt;&#xD;&#xA;&lt;li&gt;&lt;a href=&quot;http://meta.stackoverflow.com/a/92006&quot;&gt;Remove abandoned questions&lt;/a&gt;&lt;/li&gt;&#xD;&#xA;&lt;/ul&gt;" Views="22" UpVotes="74" DownVotes="991" AccountId="-1" />
  <row Id="1" Reputation="101" CreationDate="2015-03-17T14:49:42.463" DisplayName="Adam Lear" LastAccessDate="2022-09-20T19:44:00.090" Location="New York, NY" AboutMe="&#xA;&lt;p&gt;Developer at Stack Overflow focusing on public Q&amp;amp;A. Russian Canadian working in the American idiom.&lt;/p&gt;&#xA;&lt;p&gt;Once upon a time:&lt;/p&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;community manager at Stack Overflow&lt;/li&gt;&#xA;&lt;li&gt;elected moderator on Stack Overflow and Software Engineering&lt;/li&gt;&#xA;&lt;li&gt;desktop software developer ¯\_(ツ)_/¯&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;p&gt;Email me a link to your favorite Wikipedia article: &lt;code&gt;[email protected]&lt;/code&gt;.&lt;/p&gt;&#xA;" Views="76" UpVotes="0" DownVotes="0" AccountId="37099" />
  <row Id="3" Reputation="4186" CreationDate="2015-03-17T14:59:26.040" DisplayName="dfife" LastAccessDate="2022-03-01T21:34:56.797" WebsiteUrl="http://www.dustinfife.net" AboutMe="" Views="70" UpVotes="54" DownVotes="0" AccountId="136853" />
  <row Id="4" Reputation="1" CreationDate="2015-03-17T14:59:37.893" DisplayName="Conor Semler" LastAccessDate="2015-03-17T15:14:42.150" Views="3" UpVotes="0" DownVotes="0" AccountId="5091225" />
  <row Id="5" Reputation="443" CreationDate="2015-03-17T14:59:56.080" DisplayName="Niall C." LastAccessDate="2023-04-01T03:09:46.917" WebsiteUrl="http://diy.stackexchange.com/users/22" Location="Portland, OR" AboutMe="&lt;p&gt;I'm here because when I'm working on my house, I think of things that I'd like to make.&lt;/p&gt;&#xA;" Views="2" UpVotes="28" DownVotes="2" AccountId="10331" />
  <row Id="6" Reputation="525" CreationDate="2015-03-17T15:02:41.573" DisplayName="CoAstroGeek" LastAccessDate="2022-11-16T20:19:09.717" AboutMe="&lt;p&gt;Working in the space business in Colorado Springs for about 15 years with a focus on astrodynamics, orbit design, collision avoidance analysis and high performance computing.&lt;/p&gt;&#xA;" Views="3" UpVotes="46" DownVotes="1" AccountId="3341901" />
  <row Id="7" Reputation="101" CreationDate="2015-03-17T15:06:07.907" DisplayName="Shog9" LastAccessDate="2022-08-23T21:09:17.643" WebsiteUrl="http://shog9.com" Location="Frontier, WA, USA" AboutMe="&lt;p&gt;Well, fancy seeing you here!&lt;/p&gt;&#xA;&lt;p&gt;I work for &lt;a href=&quot;https://www.enterprisedb.com/&quot; rel=&quot;nofollow noreferrer&quot;&gt;EDB&lt;/a&gt;, assisting a bunch of really skilled people share their PostgreSQL expertise and experience with the world.&lt;/p&gt;&#xA;&lt;p&gt;Before that, I worked here - at Stack Overflow / Stack Exchange. Here, my duties &lt;em&gt;also&lt;/em&gt; involved helping a bunch of really skilled people share their knowledge. So you might encounter some posts from me which provide guidance and advice for the folks using this network of Q&amp;amp;A sites.&lt;/p&gt;&#xA;&lt;p&gt;I tend to write as though I know what I'm writing about... And sometimes I do... But, you should always use your own judgement: question everything, read the links to supporting materials, and draw your own conclusions. I'm usually happy to discuss anything I've written, so don't hesitate to raise concerns or point out when something is unclear!&lt;/p&gt;&#xA;&lt;p&gt;&lt;sup&gt;&lt;sub&gt;&lt;strong&gt;Whatsoever thy hand findeth to do, do it with thy might; for there is no work, nor device, nor knowledge, nor wisdom, in the grave, whither thou goest.&lt;/strong&gt;&lt;/sub&gt;&lt;/sup&gt;&lt;/p&gt;&#xA;" Views="0" UpVotes="13" DownVotes="0" AccountId="620" />
  <row Id="8" Reputation="101" CreationDate="2015-03-17T15:06:48.457" DisplayName="Gabe" LastAccessDate="2015-03-19T02:03:09.250" Views="0" UpVotes="1" DownVotes="0" AccountId="5960" />
  <row Id="9" Reputation="101" CreationDate="2015-03-17T15:07:35.310" DisplayName="mmmmmpie" LastAccessDate="2015-03-30T17:57:49.187" WebsiteUrl="http://[email protected]" Location="West Virginia" AboutMe="&lt;p&gt;Oracle DBA, Sys Admin, SQL Dev, and some radios and stuff.&lt;/p&gt;&#xA;" Views="0" UpVotes="0" DownVotes="0" AccountId="5034898" />
  <row Id="10" Reputation="3591" CreationDate="2015-03-17T15:07:41.150" DisplayName="drs" LastAccessDate="2022-01-12T20:43:49.603" Location="Rochester, NY, USA" AboutMe="&lt;p&gt;Engage in &lt;a href=&quot;http://woodworking.stackexchange.com&quot;&gt;Woodworking&lt;/a&gt;&lt;/p&gt;&#xA;" Views="57" UpVotes="2710" DownVotes="2" AccountId="1597949" />
  <row Id="11" Reputation="4798" CreationDate="2015-03-17T15:07:41.540" DisplayName="Steven" LastAccessDate="2022-08-17T13:54:36.727" WebsiteUrl="http://www.twitter.com/sberkovitz" Location="Toronto, Canada" Views="49" UpVotes="497" DownVotes="15" AccountId="79558" />
  <row Id="12" Reputation="101" CreationDate="2015-03-17T15:07:54.307" DisplayName="Mike" LastAccessDate="2015-03-17T15:07:54.307" Location="Earth, TX" Views="0" UpVotes="0" DownVotes="0" AccountId="154660" />
  <row Id="13" Reputation="631" CreationDate="2015-03-17T15:09:10.383" DisplayName="rgmrtn" LastAccessDate="2022-02-23T14:45:54.897" WebsiteUrl="" Location="62º 28'N, 114º 22' W" AboutMe="&lt;p&gt;sans tache; im certifiable&lt;/p&gt;&#xA;" Views="4" UpVotes="10" DownVotes="0" AccountId="3872225" />
  <row Id="14" Reputation="835" CreationDate="2015-03-17T15:09:14.263" DisplayName="Joe" LastAccessDate="2022-07-27T16:58:34.473" AboutMe="&lt;p&gt;SAS Programmer/Developer/Analyst/buzzword, when I'm not parenting a pair of rambunctious little boys.&lt;/p&gt;&#xA;&#xA;&lt;p&gt;#SOreadytohelp&lt;/p&gt;&#xA;" Views="13" UpVotes="31" DownVotes="0" AccountId="1780022" />
  <row Id="15" Reputation="101" CreationDate="2015-03-17T15:09:19.203" DisplayName="Chris Farmer" LastAccessDate="2016-01-05T13:13:25.113" WebsiteUrl="http://cfarmerga.myopenid.com/" Location="Nashville, TN" AboutMe="Always happy, never blue." Views="0" UpVotes="3" DownVotes="0" AccountId="323" />
  <row Id="16" Reputation="173" CreationDate="2015-03-17T15:09:25.003" DisplayName="Markie" LastAccessDate="2017-02-14T14:31:50.920" Location="UK" AboutMe="&lt;p&gt;Website developer&lt;/p&gt;&#xA;" Views="2" UpVotes="3" DownVotes="0" AccountId="2321310" />
  <row Id="17" Reputation="101" CreationDate="2015-03-17T15:09:50.447" DisplayName="Lumi" LastAccessDate="2015-04-11T01:46:50.457" Views="0" UpVotes="0" DownVotes="0" AccountId="3780251" />
</users>

I was trying to parse the above xml file in order to load it to a csv and I was getting a NullType Error Traceback. I know a lot of questions were being asked about the same issue but i think the xml files in the questions asked here were somehow different to mine. Not that I am sure how an xml file look like. But here is the code.

# Importing the required libraries
import xml.etree.ElementTree as Xet
import pandas as pd
  
cols = ["row Id", "Reputation", "CreationDate", "DisplayName", "LastAccessDate", "WebsiteUrl", "Location", "AboutMe", "Views", "UpVotes", "DownVotes", "AccountId"]
rows = []
  
# Parsing the XML file
xmlparse = Xet.parse('Users.xml')
root = xmlparse.getroot()

for r in root.findall("users"):
    
    print(rows)
        
    rowId = i.find("row Id").text
    print('Row Id: ' + str(rowId))
    reputation = i.find("Reputation").text
    creationDate = i.find("Creationdate").text
    displayname = i.find("Displayname").text
    lastAccessDate = i.find("LastAccessDate").text
    websiteUrl = i.find("WebsiteUel").text
    location = i.find("Location").text
    aboutMe = i.find("AboutMe").text
    views = i.find("Views").text
    upVotes = i.find("UpVotes").text
    downVotes = i.find("DownVotes").text
    accountId = i.find("AccountId").text
            
    rows.append({"rowId": rowId,
                "Reputation": reputation,
                "Creationdate": creationdate,
                "Displayname": displayname,
                "LastAccessDate": lastAccessDate,
                "WebsiteUrl": websiteUrl,
                "Location": location,
                "AboutMe": aboutMe,
                "Views": views,
                "UpVotes": upVotes,
                "DownVotes": downVotes,
                "AccountId": accountId
                })


            
    
df = pd.DataFrame(rows, columns=cols)
  
# Writing dataframe to csv
df.to_csv('users1.csv')

Solution

  • Are you sure you need a csv as an output and not a SS ? The field AboutMe holds html content.

    Anyways, since you need all the fields, no need to hardcode them :

    import csv
    import xml.etree.ElementTree as ET
    from itertools import chain
    
    root = ET.parse("Users.xml").getroot()
    
    data = [dict(r.items()) for r in root.findall(".//row")]  # or simply root
        
    with open("Users.csv", mode="w", newline="", encoding="utf-8") as f:
        
        flds = dict.fromkeys(chain.from_iterable([r.keys() for r in root])).keys()
        wr = csv.DictWriter(f, delimiter=",", fieldnames=flds) # adjust if needed
    
        wr.writeheader()
        wr.writerows(data)
                 
    

    Or since you're using , you can simply read_xml, then make a csv (or SpreadSheet).

    pd.read_xml("Users.xml").to_csv("Users.csv") # or to_excel("Users.xlsx")