Search code examples
rubyxmlxpathxml-parsingrexml

XPath-REXML-Ruby: Selecting multiple siblings/ancestors/descendants


This is my first post here. I have just started working with Ruby and am using REXML for some XML handling. I present a small sample of my xml file here:

  <record>
     <header>
        <identifier>oai:lcoa1.loc.gov:loc.gmd/g3195.ct000379</identifier>
        <datestamp>2004-08-13T15:32:50Z</datestamp>
        <setSpec>gmd</setSpec>
     </header>
     <metadata>
           <titleInfo>
              <title>Meet-konstige vertoning van de grote en merk-waardige zons-verduistering</title>
           </titleInfo>
     </metadata>
  </record>

My objective is to match the last numerical value in the tag with a list of values that I have from an array. I have achieved this with the following code snippet:

ids = XPath.match(xmldoc, "//identifier[text()='oai:lcoa1.loc.gov:loc.gmd/"+mapid+"']")

Having got a particular identifier that I wish to investigate, now I want to go back to and select and then select to get the value in the node for that particular identifier.

I have looked at the XPath tutorials and expressions and many of the related questions on this website as well and learnt about axes and the different concepts such as ancestor/following sibling etc. However, I am really confused and cannot figure this out easily.

I was wondering if I could get any help or if someone could point me towards an online resource "easy" to read.

Thank you.

UPDATE:

I have been trying various combinations of code such as:

idss = XPath.match(xmldoc, "//identifier[text()='oai:lcoa1.loc.gov:loc.gmd/"+mapid+"']/parent::header/following-sibling::metadata/child::mods/child::titleInfo/child::title")

The code compiles but does not output anything. I am wondering what I am doing so wrong.


Solution

  • Here's a way to accomplish it using XPath, then going up to the record, then XPath to get the title:

    require 'rexml/document'
    include REXML
    
    xml=<<END
      <record>
        <header>
          <identifier>oai:lcoa1.loc.gov:loc.gmd/g3195.ct000379</identifier>
          <datestamp>2004-08-13T15:32:50Z</datestamp>
          <setSpec>gmd</setSpec>
        </header>
        <metadata>
          <titleInfo>
            <title>Meet-konstige</title>
          </titleInfo>
        </metadata>
      </record>
    END
    
    doc=Document.new(xml)
    mapid = "ct000379"
    text = "oai:lcoa1.loc.gov:loc.gmd/g3195.#{mapid}"
    
    identifier_nodes = XPath.match(doc, "//identifier[text()='#{text}']")
    record_node = identifier_nodes.first.parent.parent
    record_node.elements['metadata/titleInfo/title'].text
    => "Meet-konstig"