Search code examples
javaxmlscalaxom

Both XPath/getChildElements failed to get XML child in XOM


I've to parse an OAI-PMH XML file, which looks like the following. I would like to iterate over all <record> nodes in ListRecord.

<?xml version="1.0" encoding="UTF-8"?>
<OAI-PMH xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd" xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <responseDate>2010-12-30T10:46:39.654+08:00</responseDate>
  <request verb="ListRecords" metadataPrefix="oai_dc">http://172.16.1.118/ahd/oai2.do</request>
  <ListRecords>
    <record>
      <header>
        <identifier>9010402101001001</identifier>
      </header>
      <metadata>
        <oai_dc:dc xsi:schemaLocationfiltered="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/">
          <dc:identifier>9010402101001001</dc:identifier>
        </oai_dc:dc>
      </metadata>
    </record>
    <resumptionToken>1509/1509</resumptionToken>
  </ListRecords>
</OAI-PMH>

But when I using XOM 1.2.5 to get those node, no matter what method I use (query or getChildElements) it always return 0 nodes.

The following is the code I use in Scala interpreter:

scala> import nu.xom.Builder
import nu.xom.Builder

scala> val builder = new Builder
builder: nu.xom.Builder = nu.xom.Builder@6682d439

scala> val document = builder.build(new java.io.File("/home/brianhsu/qqq.xml"))
document: nu.xom.Document = [nu.xom.Document: OAI-PMH]

scala> document.query("//record").size
res0: Int = 0

scala> document.query("//ListRecords").size
res1: Int = 0

scala> document.getRootElement.getChildElements("ListRecords").size
res2: Int = 0

I've no idea why I could not get ListRecords and record in the XML. Did I miss something?


Solution

  • I'll wager that it is a xmlns issue -- have you tried using the domain parameter? Try:

     document.getRootElement
             .getChildElements("ListRecords", 
                               "http://www.openarchives.org/OAI/2.0/").size
    

    Basically, many languages, when given a default ns on an XML object, will require that namespace to look that node up -- even if it is not prefixed in the outputted DOM itself.

    (This can also be done using the XPathContext object, as illustrated by Brian Hsu)