Search code examples
rubyrexml

How to get all leaf cell from an REXML element, and save them into a array?


Have a Ruby REXML element like below:

<a_1>
  <Tests>
    <test enabled='1'>trans </test>
    <test enabled='1'>ac </test>
    <test enabled='1'>dc </test>
  </Tests>
  <Corners>
    <corner enabled='0'>default</corner>
    <corner enabled='1'>C0 </corner>
  </Corners>
</a_1>

I want to find all leaf elements, so the result should be:

<test enabled='1'>trans </test>
<test enabled='1'>ac </test>
<test enabled='1'>dc </test>
<corner enabled='0'>default</corner>
<corner enabled='1'>C0 </corner>

My code is:

require 'rexml/document' 
include  REXML

def getAllLeaf(xmlElement)
  if xmlElement.has_elements?
    xmlElement.elements.each {|e| 
      getAllLeaf(e)
    }
  else
    return xmlElement
  end
end

It works fine and did show the right outputs on screen. However, I found I had a problem when I try to save the result to an Array, for this recursive procedure. So I wounder if there is a way to save this output to one array which can be used afterwards?

I struggled out a recursive way to do it, though a little odd, I would like to share it out:

def getAllLeaf(eTop,aTemp=Element.new("LeafElements"))
  if eTop.has_elements?
    eTop.elements.each {|e| 
      getAllLeaf(e,aTemp)
    }
  else
    aTemp<< eTop.dup
  end
  return aTemp
end

Solution

  • It works fine and did show the right outputs on screen.

    In fact, the code shows no outputs--anywhere. In any case, your recursive function doesn't work, which you can see if you call your method on the element <Tests> when <Tests> looks like this:

      <Tests>
        <test enabled='1'>
          <HELLO>world</HELLO>
        </test>
        <test enabled='1'>ac </test>
        <test enabled='1'>dc </test>
      </Tests>
    

    Your recursive method doesn't work because when you write:

    xmlElement.elements.each {|e|
    

    the each() method returns the thing on it's left, i.e. xmlElement.elements. Given your xml, your recursive method is equivalent to:

    def getAllLeaf(xmlElement)
        xmlElement.elements.each {|e| 
          "blah"  #your code here has no effect on what each() returns.
        }
    end
    

    ..which is equivalent to:

    def getAllLeaf(xmlElement)
        return xmlElement.elements
    end
    

    Do you want to stick with recursion? It's much simpler to search all the elements for the elements with no children:

    require "rexml/document"
    include REXML
    
    xml = <<'END_OF_XML'
    <a_1>
      <Tests>
        <test enabled='1'>trans </test>
        <test enabled='1'>ac </test>
        <test enabled='1'>dc </test>
      </Tests>
      <Corners>
        <corner enabled='0'>default</corner>
        <corner enabled='1'>C0 </corner>
      </Corners>
    </a_1>
    END_OF_XML
    
    doc = Document.new xml
    root = doc.root
    
    XPath.each(root, "//*") do |element|
      if not element.has_elements?
        enabled = element.attributes['enabled'] 
        text = element.text
        puts "#{enabled} ... #{text}"
      end
    end
    
    --output:--
    1 ... trans 
    1 ... ac 
    1 ... dc 
    0 ... default
    1 ... C0 
    

    Or, if all the leaf elements are the only elements with the attribute "enabled", you should do this:

    XPath.each(root, "//*[@enabled]") do |element|
      enabled = element.attributes['enabled'] 
      text = element.text
      puts "#{enabled} ... #{text}"
    end
    

    There's even a cryptic xpath that will directly select elements without element children:

    XPath.each(root, "//*[not(*)]") do |element|
      enabled = element.attributes['enabled'] 
      text = element.text
      puts "#{enabled} ... #{text}"
    end
    

    Also, have you considered using the nokogiri gem? It's pretty much ruby's standard XML/HTML parser.