Search code examples
pythonxmlparsingsax

Missing string using Python xml.sax to parse XML file


I am trying to parse a XML file with Python 2.7

Here is the XML file I am using:

<NS:Member>
<NS:Area fid='120410'>
<NS:Code>10021</NS:Code>
<NS:version>4</NS:version>
<NS:versionDate>2004-03-29</NS:versionDate>
<NS:theme>Buildings</NS:theme>
<NS:Value>42.826432</NS:Value>
<NS:changeHistory>
    <NS:changeDate>2002-09-26</NS:changeDate>
    <NS:reasonForChange>New</NS:reasonForChange>
</NS:changeHistory>
<NS:changeHistory>
    <NS:changeDate>2003-10-24</NS:changeDate>
    <NS:reasonForChange>Attributes</NS:reasonForChange>
</NS:changeHistory>
<NS:changeHistory>
    <NS:changeDate>2004-03-18</NS:changeDate>
    <NS:reasonForChange>Attributes</NS:reasonForChange>
</NS:changeHistory>
<NS:Group>Building</NS:Group>
<NS:make>Manmade</NS:make>
<NS:Level>50</NS:Level>
<NS:polygon>
    <NS2:Polygon srsName='NS2:BNG'>
    <NS2:Boundary>
        <NS2:LinearRing>
            <NS2:coordinates>383415.110,400491.900 383411.090,400485.570 383415.500,400482.770 383420.430,400490.530 383418.780,400491.580 383417.930,400490.240 383415.160,400491.980 383415.110,400491.900 
            </NS2:coordinates>
        </NS2:LinearRing>
    </NS2:Boundary>
    </NS2:Polygon>
</NS:polygon></NS:Area>
</NS:Member>

I am only interested at the ID, Group, make and coordinates part in the XML file.

And the code I use is:

import xml.sax

class MyHandler(xml.sax.ContentHandler):
    
    def __init__(self):
        self.__CurrentData = ""
        self.__ID = ""
        self.__Group = ""
        self.__make = ""
        self.__coordinates = []
        self.__coordString = ""
        
        
    def startElement(self, tag, attributes):
        self.__CurrentData = tag
        if tag == "NS:Area":
            self.__ID = attributes["fid"]
            print "ID: ", self.__ID
                           
            
    def endElement(self, tag):
        if self.__CurrentData == "NS:Group":
            print "Group: ", self.__Group
            
        elif self.__CurrentData == "NS:make":
            print "Make: ", self.__make
                                
        elif self.__CurrentData == "NS2:coordinates":
            print "coordinates: ", self.__coordString
                                
        self.__CurrentData = ""
        
            
    def characters(self, content):
        if self.__CurrentData == "NS:Area":
            self.__ID = content
        elif self.__CurrentData == "NS:Group":
            self.__Group = content
        elif self.__CurrentData == "NS:make":
            self.__make = content
        elif self.__CurrentData == "NS2:coordinates":
            self.__coordString = content

I expected to see the out put as follows:

ID: 120410

Group: Building

Make: Manmade

coordinates: 383415.110,400491.900 383411.090,400485.570 383415.500,400482.770 383420.430,400490.530 383418.780,400491.580 383417.930,400490.240 383415.160,400491.980 383415.110,400491.900

However, what I've got is:

ID: 120410

Group: Building

Make: Manmade

coordinates:

where the coordinates are missing and being replaced by a log of spaces.

May I know what is wrong with my code?

Many thanks.


Solution

  • All

    Thanks for your help.

    I just figured out what is going on, and it is simply because of the mis-alignment of the data file. It turns out that the </NS2:coordinates> should be right next to the end of the coordinates, rather than in a new row.

    Hope this can help other people who has the same problem.