Search code examples
javaxmlsaxsaxparser

Using the SAX characters method to parse PCDATA from an XML element


I'm using the SAX API to parse in an xml document, but struggling to store the element PCDATA from each location within the XML.

The Oracle docs SAX API show that the characters() is used to parse in PCDATA from an element, but I'm not sure on how it supposed to be called.

In my current implementation, boolean flags are used to signal when a certain element within the XML document has been encountered. The flags are being triggered in the startElement() as they should when an element is encountered.

I set a breakpoint on the boolean variable description in charaters() but the boolean isn't set to true until startElement() is called, meaning the PCDATA is never parsed.

My question is how can I call the characters() after the boolean values are set in startElement() ?

This is the startElement() which is called after the charaters():

public void startElement(String namespaceURI, String localName, String qName, Attributes atts) throws SAXException {
        if (qName.equals("location")){
            location = true;

            System.out.println("Found a location...");
            try {
                //Read in the values for the attributes of the element <location>
                int locationID = Integer.parseInt(atts.getValue("id"));
                String locationName = atts.getValue("name");


                //Generate a new instance of Location on-the-fly using reflection. The statement Class.forName("gmit.Location").newInstance(); invokes the 
                //Java Class Loader and the calls the null (default) constructor of Location.
                Location loc = (Location) Class.forName("gmit.Location").newInstance();
                loc.setId(locationID); //Now configure the Location object with an ID, Name, Description etc...
                loc.setName(locationName);
                loc.setDescription(locationDescription);


            } catch (Exception e) {
                e.printStackTrace();
            }

        }else if (qName.equals("description")){
            description = true;
            //need to invoke the charaters method here after the description 
            //flag is set to true
            System.out.println("Found a description. You should tie this to the last location you encountered...");

    }

The charaters() is called as soon as the program starts, but it needs to be called after the boolean flags are set in the above method:

public void characters(char[] ch,int start, int length) throws SAXException{
        if (location){

        }else if (description){

            locationDescription = new String( ch, start, length); 
            System.out.println("Description = " + locationDescription);

    }

Sample of one of the locations within the XML file:

<location id="1" name="Tiberius">
        <description>
        You are in the city of Tiberius. You see a long street with high buildings and a castle.You see an exit to the south.
        </description>
        <exit title="Desert" direction="S"/>
    </location>

Solution

  • how can I call the characters() after the boolean values are set in startElement() ?

    You can't. The whole point of SAX parsing is that the parser calls your handler, you don't call the parser.

    Your characters method will be called each time character data is encountered in the document by the SAX parser. Your handler will need to decide whether this data is relevant (is it a location, a description, or something that can be ignored?) and if relevant store this data somewhere where it can be retrieved later.

    You've shown us the startElement method you are using. If you haven't done so already, you will also want to override endElement. You need to set the boolean values location and description to false in an endElement method, so that your SAX handler knows it is no longer inside a location or description element as appropriate.

    You haven't shown us a sample XML document. Perhaps you have something like this:

     <widgetList>
         <widget>
             <name>First widget</name>
             <location>Over there</location>
             <description>This is the first widget in the list</description>
         </widget>
         <widget>
             <name>Second widget</name>
             <location>Very far away</location>
             <description>This is the second widget in the list</description>
         </widget>
    </widgetList>
    

    If so, you may want to handle the end of the widget element as well. For example, this could take the last location and description the handler encountered, put them together in a Widget object and store this in some list inside the handler. At the end of the parsing you can then read the list of widgets from the handler.