I'm iterating through all the data at this webpage (sample xml below) and I'm confused as to exactly how to get the required values.
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet title="XSL_formatting" type="text/xsl" href="/i/xml/xsl_formatting_rss.xml"?>
<rss xmlns:blogChannel="http://backend.userland.com/blogChannelModule" version="2.0">
<channel>
<title>Ariana Resources News</title>
<link>http://www.iii.co.uk/investment/detail?code=cotn:AAU.L&display=news</link>
<description />
<item>
<title>Ariana Resources PLC - Environmental Impact Assessment Submitted for Kiziltepe</title>
<link>http://www.iii.co.uk/investment/detail?code=cotn:AAU.L&display=news&action=article&articleid=9084833&from=rss</link>
<description>Some Article information</description>
<pubDate>Fri, 30 Aug 2013 07:00:00 GMT</pubDate>
</item>
<item>
<title>Ariana Resources PLC - Directors' Dealings and Holding in Company</title>
<link>http://www.iii.co.uk/investment/detail?code=cotn:AAU.L&display=news&action=article&articleid=9053338&from=rss</link>
<description>Some Article information</description>
<pubDate>Wed, 31 Jul 2013 07:00:00 GMT</pubDate>
</item>
<item>
<title>Ariana Resources PLC - Directorship Changes</title>
<link>http://www.iii.co.uk/investment/detail?code=cotn:AAU.L&display=news&action=article&articleid=9046582&from=rss</link>
<description>Some Article information</description>
<pubDate>Wed, 24 Jul 2013 09:31:00 GMT</pubDate>
</item>
<item>
<title>Ariana Resources PLC - Ariana Resources plc : Capital Reorganisation</title>
<link>http://www.iii.co.uk/investment/detail?code=cotn:AAU.L&display=news&action=article&articleid=9038706&from=rss</link>
<description>Some Article information</description>
<pubDate>Wed, 24 Jul 2013 09:31:00 GMT</pubDate>
</item>
<item>
</channel>
</rss>
I've had a look at the dom4j quickstart guide, although I suspect I'm just not quite getting it.
How can I iterate in such a fashion that I:
At this point I've got the below, and I think it's very wrong on the second loop... any help is hugely appreciated:
//Create a null Document Object
Document theXML = null;
//Get the document of the XML and assign to Document object
theXML = parseXML(url);
//Place the root element of theXML into a variable
Element root = theXML.getRootElement();
// iterate through child elements of root
for ( Iterator i = root.elementIterator(); i.hasNext(); ) {
Element element = (Element) i.next();
// do something
// iterate through child elements of root with element name "item"
for ( Iterator j = root.elementIterator( "item" ); j.hasNext(); ) {
Element foo = (Element) j.next();
String rnsHeadline = "";
String rnsLink = "";
String rnsFullText = "";
String rnsConstituentName = "";
Rns rns = new Rns(null, null, null, null);
}
With XPath functionality of dom4j:
// Place the root element of theXML into a variable
List<? extends Node> items =
(List<? extends Node>)theXML.selectNodes("//rss/channel/item");
// RFC-dictated date format used with RSS
DateFormat dateFormatterRssPubDate =
new SimpleDateFormat("EEE, dd MMM yyyy HH:mm:ss Z", Locale.ENGLISH);
// today started at this time
DateTime timeTodayStartedAt = new DateTime().withTimeAtStartOfDay();
for (Node node: items) {
String pubDate = node.valueOf( "pubDate" );
DateTime date = new DateTime(dateFormatterRssPubDate.parse(pubDate));
if (date.isAfter(timeTodayStartedAt)) {
// it's today, do something!
System.out.println("Today: " + date);
} else {
System.out.println("Not today: " + date);
}
}
Dom4j needs jaxen dependency for XPath to work. I used JodaTime to compare the dates, as it's a lot cleaner than using java builtin dates. Here's the full example.
Note that dom4j is not really maintained, so you might be also interested in this discussion about dom4j alternatives.