Search code examples
rubycontainersnokogirihpricot

Hpricot-style "container" method for Nokogiri? Select only certain node_types


I'm navigating a document using CSS selectors with Ruby, but I've found some css-selector bugs in Hpricot that are fixed in Nokogiri, and want to switch over.

The one issue I'm having is figuring out how to get an array of all children that are "containers" (i.e. not text nodes). Hpricot provides this functionality right out of the box with the containers method.

So in Hpricot I could do:

children = doc.select('*')[0].containers

But with Nokogiri, it seems the same functionality can only be had by the following (and I'm not sure if it works exactly the same way):

children = doc.css('*')[0].children.to_a.keep_if {|x| x.type != Nokogiri::XML::Node::TEXT_NODE }

Is there a better way to do this?


Solution

  • To clarify, you want only child elements, but not child text nodes? If so, here are three techniques:

    require 'nokogiri'
    doc = Nokogiri::XML "<r>no<a1><b1/></a1><a2>no<b2>hi</b2>mom</a2>no</r>"
    
    # If the element is uniquely selectable via CSS
    kids1 = doc.css('r > *')
    
    # ...or if we assume you found an element and want only its children
    some_node = doc.at('r')
    
    # One way to do it
    kids2 = some_node.children.grep(Nokogiri::XML::Element)
    
    # A geekier-but-shorter-way
    kids3 = some_node.xpath('*')
    
    # Confirm that they're the same (converting the NodeSets to arrays)
    p [ kids1.to_a == kids2, kids2 == kids3.to_a ]
    #=> [true, true]
    
    p kids1.map(&:name), kids2.map(&:name), kids3.map(&:name)
    #=> ["a1", "a2"]
    #=> ["a1", "a2"]
    #=> ["a1", "a2"]