Search code examples
rubyxmlrexml

How to get the absolute node path in XML using XPath and Ruby?


Basically I want to extract the absolute path from a node to root and report it to the console or a file. Below is the current solution:

require "rexml/document"

include REXML

def get_path(xml_doc, key)
  XPath.each(xml_doc, key) do |node|
    puts "\"#{node}\""
    XPath.each(node, '(ancestor::#node)') do |el|
      #  puts  el
    end
  end
end

test_doc = Document.new <<EOF
  <root>
   <level1 key="1" value="B">
     <level2 key="12" value="B" />
     <level2 key="13" value="B" />
   </level1>
  </root>
EOF

get_path test_doc, "//*/[@key='12']"

The issue is that it gives me "<level2 value='B' key='12'/>" as output. Desired output is <root><level1><level2 value='B' key='12'/> (format could be different, the main goal is to have a full path). I have only basic knowledge of XPath and would appreciate any help/guidance where to look and how to achieve this.


Solution

  • If you're set on REXML, here's a REXML solution:

    require 'rexml/document'
    
    test_doc = REXML::Document.new <<EOF
      <root>
        <level1 key="1" value="B">
          <level2 key="12" value="B" />
          <level2 key="13" value="B" />
        </level1>
      </root>
    EOF
    
    def get_path(xml_doc, key)
      node = REXML::XPath.first( xml_doc, key )
      path = []
      while node.parent
        path << node
        node = node.parent
      end
      path.reverse
    end
    
    path = get_path( test_doc, "//*[@key='12']" )
    p path.map{ |el| el.name }.join("/")
    #=> "root/level1/level2"
    

    Or, if you want to use the same get_path implementation from the other answer, you can monkeypatch REXML to add an ancestors method:

    class REXML::Child
      def ancestors
        ancestors = []
    
        # Presumably you don't want the node included in its list of ancestors
        # If you do, change the following line to    node = self
        node = self.parent
    
        # Presumably you want to stop at the root node, and not its owning document
        # If you want the document included in the ancestors, change the following
        # line to just    while node
        while node.parent
          ancestors << node
          node = node.parent
        end
    
        ancestors.reverse
      end
    end