Search code examples
javaxmlxml-parsingjdomjdom-2

Java jdom xml parsing


it's my first day with java and I try to build a little xml parser for my websites, so I can have a clean look on my sitemaps.xml . The code I use is like that

import java.io.IOException;
import java.io.InputStream;
import java.io.StringReader;
import java.net.URL;
import java.util.List;


import org.jdom2.Element;
import org.jdom2.JDOMException;
import org.jdom2.input.SAXBuilder;

class downloadxml {
   public static void main(String[] args) throws IOException {

       String str = "http://www.someurl.info/sitemap.xml";
       URL url = new URL(str);
       InputStream is = url.openStream();
       int ptr = 0;
       StringBuilder builder = new StringBuilder();
       while ((ptr = is.read()) != -1) {
           builder.append((char) ptr);
       }
       String xml = builder.toString();

       org.jdom2.input.SAXBuilder saxBuilder = new SAXBuilder();
       try {
           org.jdom2.Document doc = saxBuilder.build(new StringReader(xml));
           System.out.println(xml);
           Element xmlfile = doc.getRootElement();
           System.out.println("ROOT -->"+xmlfile);
           List list = xmlfile.getChildren("url");
           System.out.println("LIST -->"+list);
       } catch (JDOMException e) {
           // handle JDOMExceptio n
       } catch (IOException e) {
           // handle IOException
       }

       System.out.println("===========================");

   }
}

When the code pass

System.out.println(xml);

I get a clean print of the xml sitemap. When it comes to:

System.out.println("ROOT -->"+xmlfile);

Output:

ROOT -->[Element: <urlset [Namespace: http://www.sitemaps.org/schemas/sitemap/0.9]/>]

It also finds the root element. But for some reason or another, when the script should go for the childs, it return an empty print:

System.out.println("LIST -->"+list);

Output:

LIST -->[]

What should I do in another way? Any pointers to get the childs?

The XML looks like this

<?xml version="1.0" encoding="UTF-8"?>
          <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
            xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
               <url>
                   <loc>http://www.image.url</loc>
                   <image:image>
                     <image:loc>http://www.image.url/image.jpg</image:loc>
                   </image:image>
                   <changefreq>daily</changefreq>
                 </url>
                <url>
            </urlset>

Solution

  • You've come a long way in a day.

    Short answer, you are ignoring the namespace of your XML Document. Change the line:

    List list = xmlfile.getChildren("url");
    

    to

    Namespace ns = Namespace.getNamespace("http://www.sitemaps.org/schemas/sitemap/0.9");
    List list = xmlfile.getChildren("url", ns);
    

    For your convenience, you may also want to simplify the whole build process to:

    org.jdom2.Document doc = saxBuilder.build("http://www.someurl.info/sitemap.xml");