Search code examples
javaxmlparsingopenstreetmapstax

XML Parsing with STAX API


I am trying to parse XML Structure of Open Street Maps using Stax. In my implementation I use XMLStreamConstants.START_ELEMENT and XMLStreamConstants.END_ELEMENT to recognize Elements.

OpenStreetMaps structure has Elements such as tag which describe as well the node, as well as the way. Here is an example of the structure:

      <node id="2311741639" ... lat="50.7756648" lon="6.0844948">
       <tag k="entrance" v="yes"/>
      </node>
      <way id="4964449" visible="true" ... uid="67862">
       <nd ref="27290865"/>
        ...
       <tag k="highway" v="residential"/>
        ...
      </way>

How can I distinguish between node and way if parser reads Tag-Element?


Solution

  • You could use an ArrayDeque of your node representations, or even build a temporary DOM-like structure if the depth of your hierarchy is small.

    Here's an example with ArrayDeque...

    Assuming this XML file named stuff.xml:

    <?xml version="1.0" encoding="UTF-8"?>
    
    <stuff>
    
    <node id="2311741639" lat="50.7756648" lon="6.0844948">
        <tag k="entrance" v="yes"/>
    </node>
    
    <way id="4964449" visible="true" uid="67862">
        <nd ref="27290865"/>
        <tag k="highway" v="residential"/>
    </way>
    
    </stuff>
    

    Assuming the file is on path: /my/path/

    Here is the code (try/catch Java 6 style):

    InputStream is = null;
    XMLStreamReader reader = null;
    try {
        is = new FileInputStream(new File("/my/path/stuff.xml"));
        XMLInputFactory xif = XMLInputFactory.newInstance();
        reader = xif.createXMLStreamReader(is);
        ArrayDeque<String> nodes = new ArrayDeque<String>();
        while (reader.hasNext()) {
            int current = reader.next();
            switch (current) {
                case XMLStreamConstants.START_ELEMENT: {
                    nodes.add(reader.getLocalName());
                    System.out.println("START: " + nodes.getLast());
                    if (nodes.size() > 1) {
                        Iterator<String> iterator = nodes.descendingIterator();
                        // skipping first one as it's already represented
                        iterator.next();
                        while (iterator.hasNext()) {
                            System.out.println("\t in " + iterator.next());
                        }
                    }
                    break;
                }
                case XMLStreamConstants.END_ELEMENT: {
                    System.out.println("END: " + nodes.removeLast());
                    Iterator<String> iterator = nodes.descendingIterator();
                    while (iterator.hasNext()) {
                        System.out.println("\t in " + iterator.next());
                    }
                    break;
                }
            }
        }
    
    }
            catch (FileNotFoundException fnfe) {
                fnfe.printStackTrace();
            }
            catch (XMLStreamException xse) {
                xse.printStackTrace();
            }
            finally {
                if (reader != null) {
                    try {
                        reader.close();
                        is.close();
                    }
                    catch (XMLStreamException xse) {
                        xse.printStackTrace();
                    }
                    catch (IOException ioe) {
                        ioe.printStackTrace();
                    }
                }
            }
    

    Output:

    START: stuff
    START: node
         in stuff
    START: tag
         in node
         in stuff
    END: tag
         in node
         in stuff
    END: node
         in stuff
    START: way
         in stuff
    START: nd
         in way
         in stuff
    END: nd
         in way
         in stuff
    START: tag
         in way
         in stuff
    END: tag
         in way
         in stuff
    END: way
         in stuff
    END: stuff