I've to parse an OAI-PMH XML file, which looks like the following. I would like to iterate over all <record>
nodes in ListRecord.
<?xml version="1.0" encoding="UTF-8"?>
<OAI-PMH xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd" xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<responseDate>2010-12-30T10:46:39.654+08:00</responseDate>
<request verb="ListRecords" metadataPrefix="oai_dc">http://172.16.1.118/ahd/oai2.do</request>
<ListRecords>
<record>
<header>
<identifier>9010402101001001</identifier>
</header>
<metadata>
<oai_dc:dc xsi:schemaLocationfiltered="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/">
<dc:identifier>9010402101001001</dc:identifier>
</oai_dc:dc>
</metadata>
</record>
<resumptionToken>1509/1509</resumptionToken>
</ListRecords>
</OAI-PMH>
But when I using XOM 1.2.5 to get those node, no matter what method I use (query or getChildElements) it always return 0 nodes.
The following is the code I use in Scala interpreter:
scala> import nu.xom.Builder
import nu.xom.Builder
scala> val builder = new Builder
builder: nu.xom.Builder = nu.xom.Builder@6682d439
scala> val document = builder.build(new java.io.File("/home/brianhsu/qqq.xml"))
document: nu.xom.Document = [nu.xom.Document: OAI-PMH]
scala> document.query("//record").size
res0: Int = 0
scala> document.query("//ListRecords").size
res1: Int = 0
scala> document.getRootElement.getChildElements("ListRecords").size
res2: Int = 0
I've no idea why I could not get ListRecords
and record
in the XML. Did I miss something?
I'll wager that it is a xmlns
issue -- have you tried using the domain parameter? Try:
document.getRootElement
.getChildElements("ListRecords",
"http://www.openarchives.org/OAI/2.0/").size
Basically, many languages, when given a default ns on an XML object, will require that namespace to look that node up -- even if it is not prefixed in the outputted DOM itself.
(This can also be done using the XPathContext object, as illustrated by Brian Hsu)